# PRE-CUEING EFFECTS ON PERCEPTION, ATTENTION, AND COGNITIVE PENETRABILITY

EDITED BY: Athanassios Raftopoulos and Gary Lupyan PUBLISHED IN: Frontiers in Psychology

#### *Frontiers Copyright Statement*

*© Copyright 2007-2018 Frontiers Media SA. All rights reserved. All content included on this site, such as text, graphics, logos, button icons, images, video/audio clips, downloads, data compilations and software, is the property of or is licensed to Frontiers Media SA ("Frontiers") or its licensees and/or subcontractors. The copyright in the text of individual articles is the property of their respective authors, subject to a license granted to Frontiers.*

*The compilation of articles constituting this e-book, wherever published, as well as the compilation of all other content on this site, is the exclusive property of Frontiers. For the conditions for downloading and copying of e-books from Frontiers' website, please see the Terms for Website Use. If purchasing Frontiers e-books from other websites or sources, the conditions of the website concerned apply.*

*Images and graphics not forming part of user-contributed materials may not be downloaded or copied without permission.*

*Individual articles may be downloaded and reproduced in accordance with the principles of the CC-BY licence subject to any copyright or other notices. They may not be re-sold as an e-book.*

*As author or other contributor you grant a CC-BY licence to others to reproduce your articles, including any graphics and third-party materials supplied by you, in accordance with the Conditions for Website Use and subject to any copyright notices which you include in connection with your articles and materials.*

> *All copyright, and all rights therein, are protected by national and international copyright laws.*

*The above represents a summary only. For the full conditions see the Conditions for Authors and the Conditions for Website Use.*

ISSN 1664-8714 ISBN 978-2-88945-460-0 DOI 10.3389/978-2-88945-460-0

## About Frontiers

Frontiers is more than just an open-access publisher of scholarly articles: it is a pioneering approach to the world of academia, radically improving the way scholarly research is managed. The grand vision of Frontiers is a world where all people have an equal opportunity to seek, share and generate knowledge. Frontiers provides immediate and permanent online open access to all its publications, but this alone is not enough to realize our grand goals.

### Frontiers Journal Series

The Frontiers Journal Series is a multi-tier and interdisciplinary set of open-access, online journals, promising a paradigm shift from the current review, selection and dissemination processes in academic publishing. All Frontiers journals are driven by researchers for researchers; therefore, they constitute a service to the scholarly community. At the same time, the Frontiers Journal Series operates on a revolutionary invention, the tiered publishing system, initially addressing specific communities of scholars, and gradually climbing up to broader public understanding, thus serving the interests of the lay society, too.

## Dedication to Quality

Each Frontiers article is a landmark of the highest quality, thanks to genuinely collaborative interactions between authors and review editors, who include some of the world's best academicians. Research must be certified by peers before entering a stream of knowledge that may eventually reach the public - and shape society; therefore, Frontiers only applies the most rigorous and unbiased reviews.

Frontiers revolutionizes research publishing by freely delivering the most outstanding research, evaluated with no bias from both the academic and social point of view. By applying the most advanced information technologies, Frontiers is catapulting scholarly publishing into a new generation.

# What are Frontiers Research Topics?

Frontiers Research Topics are very popular trademarks of the Frontiers Journals Series: they are collections of at least ten articles, all centered on a particular subject. With their unique mix of varied contributions from Original Research to Review Articles, Frontiers Research Topics unify the most influential researchers, the latest key findings and historical advances in a hot research area! Find out more on how to host your own Frontiers Research Topic or contribute to one as an author by contacting the Frontiers Editorial Office: researchtopics@frontiersin.org

# **PRE-CUEING EFFECTS ON PERCEPTION, ATTENTION, AND COGNITIVE PENETRABILITY**

Topic Editors: **Athanassios Raftopoulos,** University of Cyprus, Cyprus **Gary Lupyan,** University of Wisconsin, United States

Attention has often been likened to spotlights and filters—devices that illuminate or screen out some inputs in favor of others. This largely passive conception of attention has been gradually replaced by a more dynamic and far-reaching process. We know that attentional processes augment neural processing at all levels, and in some cases, augmenting processing within the sense organs themselves. For example, cueing object features (e.g., instructing a subject to look at a screen for a red object) modulates prestimulus activity in the visual cortex.

Far from being limited to space or basic features, such attention cueing can function in surprisingly flexible and complex ways: people can be cued to attend to various objects, properties, and semantic categories and such attention appears to directly involve perceptual mechanisms.

Studies of spatial attention cues presented before stimulus presentation show early modulation of perceptual processing. This phenomenon refers to the enhancement of the baseline activity of neurons at all levels in the visual cortex that are tuned to the cued location, which is called attentional modulation of spontaneous activity. The spontaneous firing rates of neurons are increased when attention is shifted toward the location of an upcoming stimulus before its presentation.

Evidence also suggests that through pre-cueing of object features, feature-based attention modulates prestimulus activity in the visual cortex. The effects of pre-stimulus feature attention act either as a preparatory activity to enhance the stimulus-evoked potentials within feature sensitive areas, or they act so as to modulate stimulus-locked transients.

Both effects of pre-cueing reflect a change in background neural activity. They are called anticipatory effects established prior to the presentation of the stimulus. Thus, they do not modulate processing during stimulus viewing but bias the process before it starts via the increase in the base line firing rates; they rig-up perceptual processing without affecting it on-line.

Moreover, recent work on perceptual processing emphasizes the role of brain as a predictive tool. To perceive is to use what you know to explain away the sensory signal across multiple spatial and temporal scales. Perception aims to enable perceivers to interact with their environment successfully. Success relies on inferring or predicting correctly (or nearly so) the nature of the source of the incoming signal from the signal itself, an inference that may well be Bayesian.

Current research sheds light on the role of attention in inferring the identities of the distal objects. Attention within late vision contributes to testing hypotheses concerning the putative distal causes of the sensory data encoded in the lower neuronal assemblies in the visual processing hierarchy. This testing assumes the form of matching predictions, made on the basis of an hypothesis, about the sensory information that the lower levels should encode assuming that the hypothesis is correct, with the current, actual sensory information encoded at the lower levels. To this aim, attention enhances the activity of neurons in the cortical regions that encode the stimuli that most likely contain information relevant to the testing of the hypothesis.

In this Research Topic we aim to answer two related questions: First, what are the differences between this sort of pre-cueing effects and top-down cognitive influences on perception, and, in general, how do such attentional cuing effects relate to the broader literature on top-down influences on perception? Second, given that attention appears to change perceptual processing and that a form of attention, namely, cognitively-driven (or endogenous, or sustained) attention is a cognitive process, does attentional modulation through pre-cueing constitute cognitive penetrability of perception? Addressing these two questions will shed light on the theoretical underpinnings of cognitive penetrability and the nature of perceptual processing.

**Citation:** Raftopoulos, A., Lupyan, G., eds. (2018). Pre-cueing Effects on Perception, Attention, and Cognitive Penetrability. Lausanne: Frontiers Media. doi: 10.3389/978-2-88945-460-0

# Table of Contents


Nikki A. Lammers, Edward H. de Haan and Yair Pinto

*40 Perception and Cognition Are Largely Independent, but Still Affect Each Other in Systematic Ways: Arguments from Evolution and the Consciousness-Attention Dissociation*

Carlos Montemayor and Harry H. Haladjian


Rachel Wu and Jiaying Zhao

*81 The Effects of Spatial Endogenous Pre-cueing across Eccentricities* Jing Feng and Ian Spence

# Editorial: Pre-cueing Effects on Perception and Cognitive Penetrability

#### Athanassios Raftopoulos <sup>1</sup> \* and Gary Lupyan<sup>2</sup>

*<sup>1</sup> Department of Psychology, University of Cyprus, Nicosia, Cyprus, <sup>2</sup> Lupyan Lab, Psychology, University of Wisconsin-Madison, Madison, WI, United States*

Keywords: pre-cueing, attention, cognitive penetrability, mental imagery, early vision

**Editorial on the Research Topic**

#### **Pre-cueing Effects on Perception and Cognitive Penetrability**

Attention has often been likened to spotlights and filters that illuminate or screen out some inputs in favor of others. This, largely passive, conception of attention has been gradually replaced by a dynamic and far-reaching process. Attention augments neural processing at all levels. Attention contributes to testing hypotheses concerning the distal causes of the sensory data encoded in the lower neuronal assemblies. This testing assumes the form of matching predictions made on the basis of an hypothesis, about the sensory information that the lower levels should encode if the hypothesis is correct, with the actual sensory information encoded at the lower levels. To this aim, attention enhances or sharpens the activity of neurons in the cortical regions that encode the stimuli that most likely contain information relevant to this testing.

Concerning pre-cueing, studies of spatial and feature/object attention cues show early modulation of pre-stimulus activity in the visual cortex. Attention cueing can function in flexible and complex ways: people can be cued to attend to various objects, properties, and semantic categories and such attention appears to involve directly early perceptual mechanisms. This phenomenon refers to the enhancement of the baseline activity of neurons in the visual cortex that are tuned to the cued location or code the cued feature(s).

In this Research Topic, we aim to answer two questions: First, how do attentional cuing effects relate to top–down influences on perception? Second, given that in pre-cueing cognitively driven attention appears to change perceptual processing, does the pre-cueing attentional modulation ental the cognitive penetrability of perception? Addressing these two questions will shed light on the theoretical underpinnings of cognitive penetrability and the role of attention.

Feng and Spence examine how endogenous spatial pre-cues influence the allocation of attention in the periphery of the visual field. They present two experiments that examine how the expectation of the target's location shapes the distribution of attention across various eccentricities. Their findings suggest that spatial pre-cueing results in higher target detection rates and that a higher target detection rate is found when the target occurred at the cued direction. These findings evidence the cognitive penetrability of early vision.

Lammers et al. distinguish two conceptions of cognitive penetrability. In the broad sense, attention and memory are not pre- and post-perceptual systems but parts of the mechanisms by which top-down processes influence perception. In the narrow sense, cognitive penetrability only occurs when top–down factors are flexible and cause an illusion. Since one cannot be cognitively trained to see and unsee illusions, illusions cannot be driven by cognition in the narrow sense. However, most research focuses on foveal vision that is too unambiguous for cognitive factors to control perception. Illusions in more ambiguous peripheral visual perception could offer a different insight into this problem.

#### Edited and reviewed by:

*Snehlata Jaswal, L. M. Thapar School of Management, India*

#### \*Correspondence:

*Athanassios Raftopoulos raftop@ucy.ac.cy*

#### Specialty section:

*This article was submitted to Cognitive Science, a section of the journal Frontiers in Psychology*

Received: *07 November 2017* Accepted: *12 February 2018* Published: *27 February 2018*

#### Citation:

*Raftopoulos A and Lupyan G (2018) Editorial: Pre-cueing Effects on Perception and Cognitive Penetrability. Front. Psychol. 9:230. doi: 10.3389/fpsyg.2018.00230*

Wu and Zhao focus on prior knowledge of object associations as an aspect of attentional selection and review recent studies demonstrating that how objects are selected depends on the participant's prior experience with other objects associated with the target. Thus, prior knowledge of the test and related stimuli acquired before or during the task impacts performance since it affects attentional selection and information acquisition. Wu and Zhao do not discuss whether the effects of prior knowledge of object associations on perception constitute cases of cognitive penetrability.

Montemayor and Haladjian argue that the opposing views that cognitive penetration is pervasive and that there is a fundamental distinction between cognition and perception, which precludes cognitive penetration, are too extreme, but both theories have merits and empirical support. To address this puzzle, they discuss a theoretical approach that incorporates the merits of these two views into a broader and more nuanced explanatory framework, the consciousness and attention dissociation framework that they have developed in previous work.

Lupyan addresses two arguments aim to exclude attention from signifying the cognitive penetrability of perception. That attention is a post-perceptual process reflecting selection between fully constructed perceptual representations, and that attention is a pre-perceptual process that selects the input to encapsulated perceptual systems. Lupyan argues that although some attentional effects can be construed as post-perceptual, and that spatial attention can be seen as selecting the input, other forms of attention operate so as to change perceptual content across the entire visual hierarchy; attention is one of the mechanisms by which cognition affects perception.

Fazekas and Nanay focus on pre-cueing effects in early vision. They argue that the claim that pre-cueing studies show that perception is cognitively penetrated by means of attentional mechanisms is problematic. They argue, however, that precueing studies show that perception is cognitively penetrated via mental imagery. Cue-induced mental imagery provides a channel through which cognitive states can exert such effects on perception that fulfill the requirements of cognitive penetration.

Gross notes that Pylyshyn argues that cognitively driven attentional effects do not amount to cognitive penetration of early vision because such effects occur either before or after early vision. Critics object that such effects occur at all levels of perceptual processing but Gross supports Pylyshyn's claim. Even if Pylyshyn's critics are correct that attentional effects are not external to early vision, these effects do not satisfy Pylyshyn's requirements that the effects be direct and exhibit semantic coherence for cognitive penetration to occur.

Gatzia and Brogaard argue that it is usually assumed that covert endogenous attention differs significantly from overt endogenous attention. However, studies indicate that the oculomotor system is activated when covert attention is directed to an uncued location suggesting that covert endogenous attention may involve attentional shifts, albeit less apparent than the shifts in overt attention. The differences in the perceptual outputs could, thus, be attributed to selectively attending to a different object or a different feature of the same object. The effects of covert attention, then, can be attributed either to processes that resemble perceptual learning or attentional shifts that are not cases of cognitive penetration.

Finally, Raftopoulos defends the cognitive impenetrability of early vision in view of pre-cueing effects. He discusses the problems that cognitive penetrability causes for the epistemic role of perception in grounding perceptual beliefs and he argues that perceptual processes are cognitively penetrable if the cognitive effects undermine their epistemic role. He argues then that the cognitive effects that act through pre-cueing do not undermine the epistemic role of early vision and, also, they do not affect early vision directly; early vision is cognitively impenetrable.

The chapters in this volume show why the effects of attention on perceptual processing in general and the nature of pre-cueing in particular have attracted so much attention in the last two decades. The ever-increasing empirical literature is very rich and amenable to a variety of interpretations and, thus, its implications are hotly debated both in Philosophy and the Cognitive Sciences.

# AUTHOR CONTRIBUTIONS

All authors listed have made a substantial, direct and intellectual contribution to the work, and approved it for publication.

**Conflict of Interest Statement:** The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

Copyright © 2018 Raftopoulos and Lupyan. This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) and the copyright owner are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.

# Changing What You See by Changing What You Know: The Role of Attention

#### Gary Lupyan\*

Department of Psychology, University of Wisconsin–Madison, Madison, WI, USA

Attending is a cognitive process that incorporates a person's knowledge, goals, and expectations. What we perceive when we attend to one thing is different from what we perceive when we attend to something else. Yet, it is often argued that attentional effects do not count as evidence that perception is influenced by cognition. I investigate two arguments often given to justify excluding attention. The first is arguing that attention is a post-perceptual process reflecting selection between fully constructed perceptual representations. The second is arguing that attention as a pre-perceptual process that simply changes the input to encapsulated perceptual systems. Both of these arguments are highly problematic. Although some attentional effects can indeed be construed as post-perceptual, others operate by changing perceptual content across the entire visual hierarchy. Although there is a natural analogy between spatial attention and a change of input, the analogy falls apart when we consider other forms of attention. After dispelling these arguments, I make a case for thinking of attention not as a confound, but as one of the mechanisms by which cognitive states affect perception by going through cases in which the same or similar visual inputs are perceived differently depending on the observer's cognitive state, and instances where cuing an observer using language affects what one sees. Lastly, I provide two compelling counter-examples to the critique that although cognitive influences on perception can be demonstrated in the laboratory, it is impossible to really experience them for oneself in a phenomenologically compelling way. Taken together, the current evidence strongly supports the thesis that what we know routinely influences what we see, that the same sensory input can be perceived differently depending on the current cognitive state of the viewer, and that phenomenologically salient demonstrations are possible if certain conditions are met.

Keywords: perception, attention, top–down processing, knowledge, bistable perception, ambiguous figures, cognitive penetrability

# INTRODUCTION

The debate over whether cognition affects perception is in full swing (Stokes, 2013; Lupyan, 2015a; Raftopoulos, 2015a; Zeimbekis and Raftopoulos, 2015; Firestone and Scholl, 2016; Ogilvie and Carruthers, 2016; Teufel and Nanay, 2016). Is what we perceive influenced by our current goals, knowledge, and expectations (e.g., Hohwy, 2013; Goldstone et al., 2015; Lupyan, 2015a; Teufel and Nanay, 2016)? Or is perception composed of encapsulated systems, following their own laws and

#### Edited by:

Hanne De Jaegher, University of the Basque Country, Spain

#### Reviewed by:

Peter Fazekas, Aarhus University, Denmark Valerio Santangelo, University of Perugia, Italy

> \*Correspondence: Gary Lupyan lupyan@wisc.edu

#### Specialty section:

This article was submitted to Cognitive Science, a section of the journal Frontiers in Psychology

Received: 22 December 2016 Accepted: 27 March 2017 Published: 01 May 2017

#### Citation:

Lupyan G (2017) Changing What You See by Changing What You Know: The Role of Attention. Front. Psychol. 8:553. doi: 10.3389/fpsyg.2017.00553

logic, independent of what the perceiver knows and their current cognitive state (e.g., Pylyshyn, 1999; Orlandi, 2014; Firestone and Scholl, 2016)? The debate spans a variety of issues from how to distinguish cognition from perception to what counts as knowledge to whether the empirical target should be about objective behavior on perceptual tasks or subjective perceptual appearance. All are important questions. The present paper focuses on two aspects of the debate. First, should attentional effects on perception count as instances of cognitive penetrability of perception (CPP)? Second, what is the connection between effects of attention on perception to effects of various kinds of cues on perception? Is cuing perception "just" cuing attention? And if so, what does it tell us about CPP?

The crux is this: The same sensory input or set of inputs can produce different perceptual experiences depending on the attentional state of the viewer. Since attention is a cognitive process (see Is Attention Really Cognitive?), attentional effects ought to constitute prima facie evidence that perception is cognitively penetrable. Yet many have argued that demonstrating that cognition really influences perception needs to exclude the possibility of the effect being merely attentional (e.g., Pylyshyn, 1999; Macpherson, 2012; Deroy, 2013; Raftopoulos, 2015a; Firestone and Scholl, 2016). After describing the background and rationale of this argument, I try to make explicit some of the assumptions on which it rests, and argue that these assumptions are contradicted by what we know about how attention works. I then go through a number of demonstrations of how the same sensory inputs can be perceived in different ways and discuss the relationships between effects of attention, effects of background knowledge, and effects of cues on perception.

# Is Attention Really Cognitive?

Perhaps the most obvious reason for thinking that attention, that is, the process of attending, is a cognitive process is that when presented with some sensory input it is to possible to volitionally choose what we attend. It is also possible to instruct someone to attend to one thing versus another with immediate consequences for what the viewer ends up seeing (Mack and Rock, 1998; Ward and Scholl, 2015). Just as with many aspects of our cognition, attention is not under complete volitional control. Certain salient sensory events such as a sudden appearance of an object may cause people to automatically attend to the event whether they want to or not (Theeuwes, 2004; Theeuwes et al., 2004). Relatedly, attending to the same salient target has been shown to become easier when it is repeated (priming of pop-out)—a process at one time thought to be similarly automatic and not penetrable to an observers expectations or goals (Maljkovic and Nakayama, 1994).<sup>1</sup>

Vision scientists once thought that it was possible to produce a set of features that are the targets of attentional mechanisms. In the visual domain, dimensions such as spatial frequency and motion direction do appear to be better targets for attentional selection than more complex attributes (Wolfe and Horowitz, 2004) and can thus be fairly viewed as "basic." However, attempts to derive a complete set of features that form the targets of attentional selection and which divide pre-attentive perception from post-attentive perception have not been successful (e.g., Wolfe, 1998). Recent work has demonstrated that attention is not limited to any closed set of (ostensibly non-semantic) perceptual features such as a spatial frequency and orientation in the case of vision, but extends to clearly semantic attributes such as our knowledge of letters (Nako et al., 2014a), words (Dell'Acqua et al., 2007), and common objects (e.g., Lupyan, 2008; Lupyan and Spivey, 2010; Nako et al., 2014b). That people can attend to such clearly semantic categories means that attention makes use of learned object knowledge making it impossible to reduce attention to a process of selection of basic non-semantic features (see also Goldstone and Barsalou, 1998; Schyns et al., 1998).

# Does Attention Really Affect What We See?

Attending to different things has far-reaching effects on perception. At its most basic, cuing someone to attend to the left makes it easier to see what is on the left (Posner et al., 1980). Such spatial attention is often the sole focus in discussions of attention and CPP (Macpherson, 2012; Deroy, 2013), but it is also possible to attend to features in parallel across the visual field with the effect of improved ability to locate task-relevant stimuli (Maunsell and Treue, 2006), and, as further discussed in Section "Cuing Perception: Attention as a Mechanism by Which Knowledge Affects Perception", to attend to semantic categories (Lupyan, 2008; Çukur et al., 2013; Nako et al., 2014a; Boutonnet and Lupyan, 2015)

Attending not only improves objective performance, but in some cases demonstrably changes subjective perception, enhancing contrast (Carrasco et al., 2004), saturation (Fuller and Carrasco, 2006), and changing perceived size of attended stimuli (Gobell and Carrasco, 2005). Failing to attend to something in the right way can make the difference between seeing and not seeing (hence the term 'inattentional blindness') (Mack and Rock, 1998; Ward and Scholl, 2015).

Attentional influences are observed "early" in both place within the visual hierarchy, and time, arguably precluding the existence of truly pre-attentive perception (Foxe and Simpson, 2002; Reynolds and Chelazzi, 2004; Hayden and Gallant, 2005). Although once controversial, it is now common knowledge that attention permeates perceptual processing through and through: from at least the thalamus in the case of mammalian vision (Reynolds and Chelazzi, 2004; Jack et al., 2006; Silver et al., 2007) and down to the cochlea in the case of audition (Smith et al., 2012). We can now say with certainty that many forms

<sup>1</sup> Subsequent work has shown that even such putatively automatic attentional guidance is modulated by the viewer's expectations (Leonard and Egeth, 2008; Pascucci et al., 2012) and task relevance of the dimensions to be attended (Wolfe et al., 2003; Fecteau, 2007). More generally, the original formulation of perceptual salience in terms of sensory salience (Itti and Koch, 2000) is being supplanted by formulations that incorporate semantic factors into computatios of salience (e.g.,

Nyström and Holmqvist, 2008; Wolfe et al., 2011; Wu et al., 2014; Santangelo et al., 2015).

of attention work by altering the response profiles of neurons that respond to sensory inputs thereby altering (at least during certain temporal windows) visual representations (Gandhi et al., 1999; Lamme and Roelfsema, 2000; Corbetta and Shulman, 2002;

Ghose and Maunsell, 2002; Maunsell and Treue, 2006; Silver et al., 2007). Although the present paper cannot do justice to the vast literature on the perceptual effects of attention (see Carrasco, 2011 for review), it would not be an exaggeration to say that no part of perceptual processing is immune from attentional effects.

# WHY SOME BELIEVE ATTENTIONAL EFFECTS DO NOT COUNT AS EVIDENCE OF COGNITIVE PENETRABILITY OF PERCEPTION

And so, we have the following curious situation: attention, a cognitive process affects perception. What we perceive when we attend to one thing is different from what we perceive when we attend to another thing. Yet, it is frequently argued that attentional effects do not count as cases of cognitive penetrability of perception (Pylyshyn, 1999; Macpherson, 2012; Deroy, 2013; Raftopoulos, 2015b; Firestone and Scholl, 2016). The next two sections describe two main reasons for excluding attentional effects from being considered cases of CPP: attention as something that happens after perception, and attention as something that happens before perception.

# Attention as a Post-perceptual Process

The first reason for denying that attentional effects counts as evidence of CPP is to view attention as a process of selection happening after perceptual processing (often referred to as late-selection; **Figure 1A**). On such a view, perceptual processing may proceed in the same way regardless of what we are attending, with attention determining what contents are selected from perception. For example, Palmer et al. (1993) ask "to what extent attention affects perception rather than memory and decision?" As an illustration of a kind of attention that is well-characterized by post-perceptual selection, imagine someone scanning the walls of an art gallery trying to find the Picassos. To accomplish this, the visual system must process each painting to a sufficient degree so that, at minimum, Picassos can be distinguished from the rest. If one assumes that our knowledge of what Picassos look like resides outside of the visual system, then the best the visual system can do is deliver a 'percept' to whatever downstream system has the requisite knowledge. That system can in turn send a signal to examine the painting further, reject it outright as an obvious non-match, and so on. A classic example of a situation often characterized in just such a way is the process of attending to a conversation in a noisy room. Although we may have the impression that we are listening only to the voices of the people we are conversing with, on hearing our name, our focus of attention may suddenly be jerked away to another corner of the room. For this to happen, we must have been processing the ambient speech all along, at least to

FIGURE 1 | Three ways of construing the relationship between attention, perception, and cognition. In all cases, cognitive states can influence what we perceive by literally changing the input for example, via eye-movements. (A) Attention as selection that works on the output of a pre-attentive perceptual processing module. Attention construed in this way can be relevant to CPP insofar as perceptual behaviors that one is interested in (e.g., being aware of what one sees) require attention. (B) Attention as a pre-perceptual filter or spotlight that shapes input to perception. Attention construed in this way is relevant to CPP insofar as the filters are not limited to content-neutral dimensions such as location, but influence processing in a content-specific (semantically coherent) manner. (C) A more general construal of attention as a modulator of perception (symbolized by the symbol for convolution). Some perceptual processes may involve more attentional modulation than others. Cognitive states can influence perception via attention or in other ways. Both routes constitute genuine cases of CPP insofar as the influence is semantically coherent rather than content-neutral.

the level of distinguishing one's name from all other words. Notably, such recognition of unattended conversation is hardly ubiquitous, happening only about a third of the time, and more so in people with poorer working memory (Conway et al., 2001). More generally, the locus of selection is not fixed, but depends on factors like perceptual and attentional load (e.g., Lavie and Tsal, 1994). Findings like these helped resolve the longstanding debates between early and late-selection (Lavie, 2005).

Still, to the extent that attention sometimes just selects stimuli that have already received full perceptual processing—"a subtle form [of] choosing what to perceive" (Macpherson, 2012)—one

may conclude that it is therefore of little relevance to questions about effects on perception itself.

## Attention as a Pre-perceptual Process

One reason why many researchers studying perception are so interested in attention involves modulation of perception rather than just a process of selecting amongst fully processed perceptual states. It is possible to be cued (by an experimenter or to cue oneself) to attend to a particular place, feature, or category, with the result that of being objectively better at perceiving. Not just remembering, not just better knowing what to do, but perceiving better (Carrasco, 2011).

Critics, however, have argued that although such effects clearly count as evidence of attention (a cognitive process) changing what we perceive, they do not count as cases of CPP because attention simply changes the input to the perceptual system. On this view, attention is something that happens before perception (**Figure 1B**). The perceptual system then goes on responding to the altered input in a reflexive and modular way encapsulated from the viewer's knowledge, goals, and expectations. This argument is very clearly expressed by Firestone and Scholl (2016) who argue that attentional effects can be equated to more obvious changes in input like closing or moving one's eyes:

. . .there is a trivial sense in which we all can willfully control what we visually experience, by (say) choosing to close our eyes (or turn off the lights) if we wish to experience darkness. Though this is certainly a case of cognition (specifically, of desire and intention) changing perception, this familiar "top-down" effect clearly isn't revolutionary, insofar as it has no implications for how the mind is organized — and for an obvious reason: closing your eyes (or turning off the lights) changes only the input to perception, without changing perceptual processing itself.

. . .changing what we see by selectively attending to a different object or feature . . . seems importantly similar to changing what we see by moving our eyes (or turning the lights off). In both cases, we are changing the input to mechanisms of visual perception, which may then still operate inflexibly given that input.

# Attention as Confound versus Attention as Mechanism

To summarize the argument thus far: there are two broad objections to including attentional effects as instances of CPP. The first objection is that to attending to something involves selecting among already formed perceptual representations (**Figure 1A**). The second objection is that attention simply changes the input to perception. This of course changes what we see, but only because of a difference in input (**Figure 1B**). A related proposal is that attention "rigs up" perception without altering it (Raftopoulos, 2015b).

The first objection—attention is post-perceptual selection faces two problems. First, although it may indeed be accurate to characterize some attentional effects in this way, it is abundantly clear that much of attention is not simply selection and operates by augmenting perceptual processing itself. Second, regardless of how "late" the attentional effect in question may be occurring and how complete the perceptual processing of unattended stimuli may be, one may wish to nevertheless include such cases as candidates for CPP if they concern behaviors that we wish to count as truly perceptual. For example, even if it could be shown that the unattended gorilla (Simons and Chabris, 1999) is fully processed, its phenomenological invisibility may be relevant if we wish to include being aware of what one sees as part of perception.

To understand why the second objection—attention as a change in input—is compelling to some, and where it goes ultimately wrong, we need to examine some of its underlying assumptions. The objection rests on an analogy between a change in input caused by a change to the sensors, e.g., moving one's eyes to the left to better see what is on the left, or squinting to blur out some details to see the larger picture, with changes in input caused by endogenous attentional mechanisms. The analogy is at least partially justified for spatial attention. Just as moving our eyes toward a target helps us see it, we have long known that shifting attention covertly—without moving one's eyes—can likewise lead to perceptual improvements (Posner, 1980). Covertly attending to a spatial location enhances spatial resolution, improving performance on tasks that benefit from enhanced spatial resolution (Yeshurun and Carrasco, 1998). Covert attentional shifts are closely correlated with eye movements (e.g., Hart et al., 2013) and share common neural mechanisms. For example, electrical stimulation of the frontal eye fields can evoke both saccadic eye movements to specific locations and attentional shifts.<sup>2</sup> Such findings that make it sensible—on first glance—to conclude that perceptual changes due to attention are just like those caused by changes to changes to eyegaze. As we shall see, the analogy quickly breaks down when we go beyond spatial attention. The domain of spatial attention, however, allows us to better understand why a change in input (whether by moving one's eyes or moving covert attention) would not constitute CPP. The reason is that the change in perception caused by such a change in input is not content sensitive. Insofar as looking to the left helps us see things on the left solely due to a change in what light now enters the eyes, it will be equally helpful for everything that is on the left. This improvement is independent of whether our intention was to look to check for oncoming cars or for pedestrians. In the literature on CPP, this is broadly referred to as a lack of semantic coherence between the cognitive state and the resulting percept (Pylyshyn, 1999; see Lupyan, 2015c; Stokes, 2015 for discussion).

As I will argue below, although some types of attentional shifts may lack semantic coherence, this is not the case for other kinds of attentional effects. It is one thing to find that attending to the left adds visual detail to anything on the left. But it is quite another to discover that one can attend to a certain object or object category with the perceptual consequence being changed perception of the content that is being attended. Note that even if one argued that the reason that attending to, e.g., cars helps one see cars better is through a change in input to perception, such a change would have to involve a content-specific change and is thus a qualitatively different kind of effect than simply seeing

<sup>2</sup> Such findings led researchers to formulate the premotor theory of spatial attention on which saccades are covert attentional shifts writ large (see Thompson et al., 2005 for a critical review).

better anything in a particular location. This point is discussed in greater detail below, but first I would like to illustrate how easy it can be to confuse confounds with mechanisms when thinking about CPP.

#### A Mini Case-Study of Confusing Confounds and Mechanisms

In an earlier version of the argument that attentional effects are simple changes in input, Fodor (1988, p. 191) uses the following imagined dialog to draw an analogy between changing one's percepts by changing where one attends and changing one's heart rate by doing physical exercise:

a: Heart rate is cognitively penetrable! I can choose the rate at which my heart beats. b: Remarkable; how do you do it?

a: Well, when I want it to beat faster, I touch my toes a hundred times. And when I want it to beat slower, I take a little nap. b: Oh.

According to Fodor, it is just as silly to argue that attentional effects count as instances of CPP as it is to argue that changing heart rate through exercise counts as a cognitive effect on heart rate. But why does speeding up heart rate by doing some toe touches fail as an argument for heart rate being cognitively penetrable? Because—one assumes—the 100 toe touches would speed up heart-rate to the same extent regardless of whether one's intention was to speed up the heart rate or to stretch one's hamstrings. There is a lack of semantic coherence. But consider that it is also possible to speed one's heart rate simply by thinking certain thoughts. No toe touches required (Manuck, 1976; Peira et al., 2013). But suppose that the way one influences heart rate is by thinking about doing exercise. Does this qualify as heart rate being cognitively penetrable? If not, why not? One may argue that it is actually the thoughts about exercise that are causing the heart rate increase rather than the thoughts about increasing one's heart rate. But this is a strange objection. Perhaps thoughts about exercise are the mechanism by which we can cognitively regulate our heart rate.

For argument's sake, let us assume that thinking about exercise is hacking the heart-rate control system and so does not count as a true cognitive influence. Consider then the following case. Pollo et al. (2003) showed that administering a placebo analgesic reduced the perceived pain of an electric shock to the forehead while also reducing the subject's heart rate. In other words: when subjects had a placebo-induced belief of being administered a pain-killer, they not only experienced less pain, but a decrease in heart rate. On investigating the mechanism underlying this effect Pollo et al. (2003) discovered that administering an opioid antagonist negated the placebo's effect on both pain and heart rate, suggesting that the placebo-induced expectation of painrelief produced a release of endogenous opioids which had the effect of reducing pain and heart-rate, an effect that blocking the opioid receptors could negate.

At this point, a critic may point out that it wasn't really the subject's cognitive state that reduced their heart rate, but rather the endogenous opioids. But if a person's beliefs and expectations (which are themselves physical states) are to have an effect on some physiological response, it must happen through some mechanism or another! The endogenous opioids released as a result of the placebo are not a confound. They are part of the mechanism by which placebo analgesics work. It could have turned out that the mechanism is different (and indeed, Pollo et al. describe a different mechanism for placebo effects on ischemic arm pain). And so, it's the same with attention. To the extent that attention is a key mechanism of how perception performs its function of "providing a description that is useful to the viewer" (Marr, 1982, p. 31), to exclude attentional effects from consideration as cases of CPP is to confuse confounds with mechanisms.

# PERCEIVING THE SAME INPUT IN DIFFERENT WAYS: ATTENTIONAL AND KNOWLEDGE-BASED INFLUENCES

In this section, I delve into some details of the interplay between perception, attention, and higher-level cognitive states (**Figure 1C**). My main focus will be on cases of bistable or ambiguous perception as they allow us to keep the physical stimulus the same while changing the observer's knowledge and/or attentional state. In Section "The Role of Attention and Knowledge in the Perception of Simple Ambiguous Figures", I discuss some of the ways that attention and prior knowledge influence our perception of bistable images. Some of these may be dismissed as "just" changes in input or post-perceptual selection, but others cannot be. In Section "What Makes Some Perceptual Interpretations Better Than Others?" I sketch in broad strokes a way of thinking about what makes some perceptual interpretations better than others and how attention and knowledge can make a particular interpretation more or less "good." In Section "Cuing Perception: Attention as a Mechanism by Which Knowledge Affects Perception ", I use the framework developed in Section "What Makes Some Perceptual Interpretations Better Than Others?" to discuss how in-themoment attentional cues influence what we see, and argue for attention as one of the mechanisms by which knowledge affects perception.

# The Role of Attention and Knowledge in the Perception of Simple Ambiguous Figures

If perception is cognitively penetrable, we should be able to find cases where the same physical input can be perceived differently depending on the cognitive state of the perceiver. The existence or ambiguous or bistable images of the kind shown in **Figure 2** provide a natural starting point. That visual bistability is a perceptual phenomenon is supported by both the phenomenological potency of viewing bistable displays and by studies of its neural correlates (e.g., Tong et al., 1998; Meng and Tong, 2004; Kornmeier and Bach, 2005, 2012).

That there are images that can be perceived in multiple ways is not necessarily relevant to the CPP thesis. Consider what is

perhaps the best known example of bistability—the Necker cube (**Figure 2A**). The Necker cube can be perceived as extending in depth in two mutually exclusive ways: as if the viewer is looking at it from the top or the bottom. The same 2-dimensional image has two different three-dimensional interpretations indicating that there is a many-to-one projection between 2-dimensional images and 3-dimensional objects. The situation becomes relevant to the CPP thesis if the same 2-dimensional image can evoke a different 3-dimensional interpretation depending on the viewer's cognitive state.

An effective way of inducing a switch between the two interpretations of the Necker cube is to look at different parts of the image. For example, looking at the right cross in **Figure 2A** causes most viewers to perceive the cube as if looking at it from the top, while looking at the left cross causes most viewers to perceive the alternate perspective. Of course this is utterly unconvincing as a demonstration of CPP. The reason is an apparent lack of semantic coherence. One assumes that the effect of looking at the left or right cross would bias perception in the same way regardless of viewer's cognitive state.

But it turns out that the intention to see the cube in one way or another not only independently affects which 3-dimensional one sees, but has a considerably larger effect on the interpretation than where one looks (Hochberg and Peterson, 1987; Toppino, 2003; Meng and Tong, 2004; see also Peterson and Hochberg, 1983; Liebert and Burk, 1985; Peterson, 1986). What about covert attention? In contrast to eye movements, these are more difficult to control. One solution is to present a viewer with a very small Necker cube for which covert or overt attentional shifts ought to be less consequential. Toppino (2003) carried out this experiment and found that the viewer's intentions had equally large effects when viewing a cube in which the critical areas spanned less than 1 ◦ of visual angle compared to a cube an order of magnitude larger.<sup>3</sup>

Recall that what makes Necker cube ambiguous is that the 2-dimensional rendering of the cube is equally compatible with two 3-dimensional interpretations. The fluidity with which we construct 3-dimensional percepts from 2-dimensional inputs makes it tempting to think that although which 3-dimensional percept we see at a given time can be influenced by expectations and task-demands, the generation of the percepts themselves is not subject to our knowledge and expectations (Pylyshyn, 1999). But is this true? The very ability to see shapes like the Necker cube as being 3-dimensional is not hardwired. It depends on having had sufficient visual experience (Gregory and Wallace, 1963). For example, one individual who regained sight after being blind between the ages of 3 and 43 described the Necker cube as a "square with lines" (Fine et al., 2003). Still, it may be argued that given sufficient visual experience, the visual system matures sufficiently to allow the process of computing depth from 2-dimensional cues to function in an automatic bottomup way free of further influences of knowledge and expectations. But this is not so. Both images in **Figure 2B** are 2-dimensional and composed of all the same elements. To the "early" visual system the two objects should look much the same. Yet, the left object is readily seen as a 3-dimensional espresso maker casting a shadow while the object on the right continues to look 2 dimensional (Moore and Cavanagh, 1998). The availability of the 3-dimensional interpretation that is competing (and in this case, quickly winning) when viewing the espresso-maker is simply unavailable for the object on the right until one gains appropriate experience such as glimpsing an enriched grayscale depiction that makes its 3-dimensionality easier to see (see also Sinha and Poggio, 1996).

**Figure 2C** shows another example of an ambiguous image of striking simplicity. When shown this image, the majority (28/50) of participants recruited from Amazon Mechanical Turk reported seeing a 2-dimensional figure (Lupyan, unpublished data). Of these, 61% described it in terms of lines and angles and 39% described it in terms of higher level units such as a staircase or sideways alphanumeric characters: an L and two Zs, or two Zs and a 7. But there is a 3-dimensional alternative that was apparent to the remaining 22/50 observers: an embossed letter E. It is possible, of course, that the ability of the latter group to perceive the alternate interpretation is strictly due to differences in perceptual experiences. Perhaps people who see the embossed letter are those who have previously seen many more embossed letters and therefore are better at recognizing them. But when a separate group of 50 participants were presented with the very same image and informed that it was possible to see it as a letter, about 92% were able to see the embossed E, showing that controlling for prior perceptual experience—simple verbal cues can affect what people see.

**Figure 2D** shows a different kind of ambiguity. Here, the bistability is between two meanings that can be constructed from the same visual input by assigning the same contours and feature to different parts: the chin in one alternative is the nose in the other. Switching between the two interpretations can be aided by selectively attending to different parts of the image, but can also be accomplished by nonvisual cues, e.g., hearing a voice of a young woman prior to seeing the display (Hsiao et al., 2012). **Figure 2E** poses a similar problem to **Figure 2D** except that seeing the alternative to the initially dominant parrot requires a more significant restructuring of the scene. The alternatives are now not between two kinds of faces, but between a typicallooking parrot and a very atypical woman in body paint. After accomplishing this restructuring, the viewer now has a second stable interpretation that can begin to compete with the initial interpretation (Scocchia et al., 2014). An interesting and to my knowledge untested possibility is that it is only after this initial restructuring that the second interpretation become a target for effective attentional selection.

Another example of a basic visual process being affected by knowledge is shown in **Figures 2F,G**. Lest our visual system be limited to processing a small number of fixed inputs, it is critical to have a way of parsing inputs into constituent (and generative) parts. The most basic way to parse a visual input is by distinguishing the from the ground. How can we tell what is the figure and what is the ground in **Figure 2F**? The solution originally conceived by the Gestalt psychologists is to formulate a set of perceptual 'laws' (or biases) such as: objects occupy less area than the ground, objects are generally enclosed and form contiguous regions, objects often have symmetrical contours. Notice that none of these make any mention of object meaning and do not take into account prior experience with the candidates for object-hood. As predicted by these Gestalt grouping principles, in **Figure 2F**-left it is easier to see the center black region as the figure than to see the white "surround" as the figure. In **Figure 2F**-right, the situation is more ambiguous; the black and white regions appear to make equally good figures. But consider what happens when the figures are rotated by 180◦ (**Figure 2G**). The Gestalt dispreferred regions now appear as figure in **Figure 2G**-left, while in **Figure 2G**-right, the white and black regions are now unambiguously perceived as figure and ground, respectively (Peterson et al., 1991). This basic finding and the subsequent work by Peterson and colleagues

<sup>3</sup>Discussing the Necker Cube, Deroy (2013) writes: "Trivially speaking, two persons confronted to the same visual object in the same illumination conditions may not perceive the same thing because they don't look at it in the same way." If by "don't look at it in the same way" Deroy means that people attend to different parts of the image, then the argument does not square with the empirical literature because people do in fact see different things even if they attend to the same regions. If by "don't look at it in the same way" Deroy means that people look at the same image with different expectations and this affects what they see, then that sure sounds like cognition affecting perception.

(Peterson and Gibson, 1994; Trujillo et al., 2010; Cacciamani et al., 2014) provides an obvious challenge to explaining figureground segregation using perceptual laws that are not sensitive to content. The relevance of such findings to CPP is that they show that figure-ground segregation does not operate in a content-neutral way and is sensitive to at least some aspects of meaning (see Peterson, 1994; Vecera and O'Reilly, 1998 for discussion).<sup>4</sup> Results such as these also challenge accounts on which perception proceeds through a series of serial operations with earlier ones informationally encapsulated from the results of later ones. Indeed, the idea that object knowledge affects figureground segregation appear downright paradoxical if one assumes that the process of figure-ground segregation is what provides the input to later object recognition processes (see also Lupyan and Spivey, 2008; Kahan and Enns, 2014). But finding that recognition can precede and influence such "earlier" perceptual processes is exactly what one would expect if the goal of vision to provide the viewer with a useful representation of the input (Marr, 1982), and to do so as quickly as possible (Bullier, 1999).

**Figure 2H** provides another example of the role that prior knowledge can play in constructing meaning from an otherwise meaningless visual input. Often called "Mooney images" (Mooney, 1957) such two-tone images can be seen perfectly well, but the majority of people, most of the time, are unable to perceive anything of meaning in the image.<sup>5</sup> In the case of **Figure 2H**, approximately 10% of viewers spontaneously perceive the meaningful object. The situation changes dramatically when people are provided with a verbal hint. Told that there is a musical instrument in the image, about 40% quickly see the trumpet. Such verbal cues not only improve recognition, but have additional perceptual consequences. Perceiving the image as meaningful helps people perform a simple perceptual task determining whether two Mooney images are identical or not. These behavioral improvements were related to differences in early visual processing (specifically, larger amplitudes of the P1 EEG signal, Samaha et al., 2016; see also Abdel Rahman and Sommer, 2008). Contra Pylyshyn's (1999, p. 357) statement that "verbal hints [have] little effect on recognizing fragmented figures", we find that not only do verbal hints greatly enhance recognition, but they facilitate visual discrimination.

# What Makes Some Perceptual Interpretations Better Than Others?

Despite the important differences between the cases shown in **Figures 2A–H**, there is something to be gained by attempting to unify them through the lens of perception as an inferential process—a process of generating and testing hypotheses (Gregory, 1970; Barlow, 1990; Rao and Ballard, 1999; von Helmholtz, 2005; Enns and Lleras, 2008; see also Clark, 2013; Hohwy, 2013 for overviews). For example, at the level of object representations, the Necker Cube generates three hypotheses: (a) a 2-dimensional collection of lines, (b) 3 dimensional cube extending up, and (c) a 3-dimensional cube extending down. Hypothesis (a) is dispreferred because it leaves too much unexplained. Accepting (a) would mean that the angles and lines are arbitrary. Hypotheses (b) and (c) offer a simpler description: what explains the arrangement of the lines is that they correspond to a cube. These two hypotheses are equally good at accounting for the arrangement of the lines, but yield mutually exclusive percepts and as a result begin to oscillate (see Hohwy, 2013 for general discussion; see Rumelhart et al., 1986; Haken, 1995; Sundareswara and Schrater, 2008 for examples of computational models).

In the language of predictive-coding, for someone with normal viewing history, hypothesis (a) has higher surprisal (lower 'goodness') than hypotheses (b) or (c). In **Figures 2F,G**, a segregating the figure from the ground should take object semantics into account because semantics affects the likelihood that a given feature corresponds to an actual object. We can apply the same principles of predictive coding to better understand what is happening in **Figures 2C,H**. Representing these as a meaningless collection of arbitrary lines results in a less compressible representation than representing them as meaningful objects (see Pickering and Clark, 2014 for a discussion of the relationship between predictive coding and compressibility). This attempt to 'explain away' sensory inputs in as compact way as possible is a common foundation of the various predictive-coding models of perception (van der Helm, 2000; Huang and Rao, 2011; Friston et al., 2012), with preference for simplicity going well beyond perception (Chater and Vitanyi, 2003; Feldman, 2003).

In attempting to 'explain away' **Figures 2C,H**, however, a hypothesis corresponding to meaningful objects is simply unavailable to most people. As soon as one becomes available, e.g., as a result of a verbal hint, the hypotheses dominates perception and we see the previously meaningless collection of lines as something meaningful percept (an embossed E, a trumpet) (see Christiansen and Chater, 2015 for a discussion of this same idea of continuous re-coding of input into chunks in the domain of language processing).

Although beyond the scope of this paper, it is worth noting that Gestalt principles and other "laws of perception" are not in conflict with theories focusing on minimization of prediction error. The latter theories can be seen as attempting to explain perceptual laws in more general terms. For example, a perceptual "law" such as common fate (wherein separate features all moving together against a background are likely to be grouped into a single object) can be thought as minimizing surprisal/ prediction error by positing a hypothesis that the moving parts can be predicted by a single cause—their belonging to one object. This hypothesis is preferable to the more complex alternative (corresponding to higher surprisal/prediction error)

<sup>4</sup>Firestone and Scholl (2015) reinterpret Peterson and Hochberg (1983) finding by arguing that the differences between figure-ground assignment in the familiar and unfamiliar orientations "don't involve effects of knowledge per se [because] inversion eliminates this effect even when subjects know the inverted shape's identity" (see Section 2.5 of their paper). This argument confuses different senses of knowledge. We may know in an intellectual sense that an upside–down outline of a woman is still an outline of a woman, but despite this intellectual knowledge it is still harder to recognize the white shapes as a woman's silhouette in **Figure 2F** than in **Figure 2G**. The harder the recognition, the less effect the object representation can have on the figure-ground segregation process as it unfolds.

<sup>5</sup>One may speculate that people's difficulty with making sense of such images is analagous to the problem faced by individuals with associative object agnosia when they attempt to make sense of more conventional images (Farah, 1990).

of there being multiple independent causes to the common motion. Our resulting percept of a single moving object is the phenomenological consequence of that simpler hypothesis being preferred.

Learning to associate certain visual inputs with meaningful categories: faces, letters, espresso makers, body-painted women, trumpets, etc., makes these richer hypotheses available as potential alternatives. We (our visual system) can evaluate the likelihood that an input corresponds not just to a visual object, but to a trumpet, or the letter E. These alternatives are preferred to the extent that they offer stronger predictive power, explaining for example, the observed placement of the various visual features. Allowing vision to benefit from these higher-level hypotheses helps make meaning out of noise.

# Cuing Perception: Attention as a Mechanism by Which Knowledge Affects Perception

In discussing how visual knowledge can lead observers to perceive the same sensory input in different ways, I conflated two kinds of effects of cognition on perception. The first concerns the finding that previous experiences with letters, faces, and various objects look like can influence the operation of even basic perceptual processes like figure-ground segregation and construction of 3-dimensional structure. The second is that it is sometimes possible to change what one sees through various cues. For example, the likelihood that people perceive **Figure 2C** as a single three-dimensional object is affected by being told that it is possible to see it as a letter. Some critics of CPP contest that CPP of the first type (sometimes called "diachronic penetrability", see McCauley and Henrich, 2006 for discussion) is not really evidence of CPP because it merely shows that such visual knowledge has become incorporated into the visual system over time at which point it (apparently) no longer counts as cognitive. I will forego discussing this rather odd argument. Instead, in this section, I elaborate on the second kind of effect—sometimes called synchronic penetrability wherein similar or even identical inputs are perceived in different ways depending on the cognitive state of the viewer at the time the input is perceived (see also Klink et al., 2012).

One way to change perception is by using a perceptual cue. For example, to help people see the embossed E in **Figure 2C**, one can cue them with a conventional letter "E" (**Figure 3**, top row). To bias people to see the young woman in **Figure 2D**, one can cue them with a biased version of the figure (**Figure 3**, middle row), and to help people see the trumpet in the Mooney image in **Figure 2H**, one can cue them with a more conventional picture of a trumpet (**Figure 3**, bottom row) or else trace out the

outline of the trumpet in the original image.<sup>6</sup> If such perceptual cues were the only way to affect how an ambiguous or underdetermined image is perceived, such cueing effects would be of little relevance to CPP. But there are other ways of cueing perception. For example, I suspect that simply hearing "Eeee" immediately prior to or during seeing **Figure 2C** would increase the likelihood of perceiving it as a single three-dimensional letter E. Similarly, an auditory cue—the voice of a younger or older woman biased people to perceive the younger or older woman in **Figure 2D**, respectively, an effect that was additive with effects of spatial attention (Hsiao et al., 2012). Finally, although not empirically tested to my knowledge, it is conceivable that hearing a trumpet sound can help people see the trumpet in **Figure 2H**.

Such cross-modal effects are sometimes excluded from counting as instance of CPP because, it is argued, they merely show automatic influences of one perceptual modality on another—an intraperceptual effect (e.g., Pylyshyn, 1999, sect. 7.1) rather than an effects of cognition on perception. Drawing such an intraperceptual boundary strikes me as self-defeating, for it would mean that the knowledge as what an "E" sounds like, male and female voices, and musical instruments would all become part of the perceptual system. Someone holding the view that audiovisual integration does not count as CPP may point to the findings of Alsius and Munhall (2013) which show that audiovisual integration can occur in the absence of conscious awareness of the visual stimulus (see also Faivre et al., 2014) as evidence that such integration occurs in a completely automatic way. But this automaticity is not inevitable: even conventional audiovisual integration can be interfered with by having participants engage in an attentionally demanding task (Alsius et al., 2005).

We need not restrict ourselves to literal perceptual cues. The next two columns of **Figure 3** show examples of general and more specific linguistic cues to perception. As mentioned above, being informed that it was possible to see a letter in **Figure 2C** more than doubled the likelihood of people seeing the embossed E, a result that one can speculate would only increase if one was given more precise information via language of what the letter was. In the old-woman/young-woman case, although many people quickly see the ambiguity, the possibility of biasing naïve viewers to one interpretation or another purely through language (i.e., without any overt perceptual cues), and that linguistic instructions continue to be effective in biasing one's perception (e.g., Hsiao et al., 2012), speaks to the power of language to guide perception in the absence of any overt perceptual cues. In case of the Mooney image depicted in the last row of **Figure 3**, even superordinate linguistic cues like "animal" and "musical instrument" aid in recognition of the images. More specific cues (e.g., the word "trumpet") are predictably more effective (Samaha et al., 2016). In other work, we have shown using hearing a verbal cue affects visual processing within 100 ms. of visual onset (Boutonnet and Lupyan, 2015), results that we interpret as showing that verbal cues activate visual representations, establishing "priors" that change how subsequent stimuli are processed (Edmiston and Lupyan, 2015, 2017; Lupyan and Clark, 2015).

Here, one may again ask whether the power of language to guide and bias perception is due to changing the input to perception via attention. The answer is that it depends on the cue. A location cue like "LEFT" is highly effective in changing where someone attends (Hommel et al., 2001), but because its effect is (presumably) content neutral, it is possible to think of it as merely a change in input. Other linguistic cues, however, have much richer semantic content: hearing "dog" helps people perceive dogs (Lupyan and Ward, 2013; Boutonnet and Lupyan, 2015). One may argue that such effects simply show that language is a good way to "rig up" perception (Raftopoulos, 2015b). But rather than being an alternative to CPP, such an argument speaks to the mechanism by which language has its effects. The fact of the matter remains that a person presented with the same sensory input can perceive it in different ways depending on a word they had previously heard (see Lupyan, 2015a for review).

# THE "WOW" FACTOR: WHEN CAN WE REALLY SEE OUR KNOWLEDGE IMPACTING PERCEPTION?

The evidence for the various ways in which knowledge affects perception keeps growing. Here is a brief sampling: knowledge of how arms and legs are attached to torsos affects perceived depth from binocular disparity information (Bulthoff et al., 1998). Knowledge that bricks are harder than cheese affects amodal completion (Vrins et al., 2009). Recovery of depth from 2-dimensional images depends in part on object recognition (Moore and Cavanagh, 1998, **Figure 2B**) as is the arguably more basic process of figure-ground segregation (Peterson, 1994 for review, see also **Figures 2F,G**). Scene knowledge affects perception of edge orientations (Neri, 2014). Knowledge of the real-world size of, e.g., a basketball affects apparent speed of motion (by altering perception of distance) (Martín et al., 2015). Knowledge of usual object colors shades our color perception (Hansen et al., 2006; Olkkonen et al., 2008; Witzel et al., 2011; Kimura et al., 2013; Witzel, 2016) and influences the vividness of color afterimages (Lupyan, 2015b). Meaningfulness of printed words affects their perceived sharpness and influences our ability to detect changes in sharpness (Lupyan, 2017b). Hearing the right word, can make visible something that is otherwise invisible (Lupyan and Ward, 2013).

In spite of this evidence and the cases described in Section "The Role of Attention and Knowledge in the Perception of Simple Ambiguous Figures", some critics of CPP remain unmoved. One reason for the continued resistance is that many of these results lack the "wow" factor common to many wellknown illusions designed to demonstrate the workings of the visual system. For example, Firestone and Scholl (2015, 2016) ask why, if what we know changes what we see, is it so hard to find cases where one can really see these effects for oneself. As a comparison of what it means to see a visual effect for oneself, consider our perception of how bright something is. Naively,

<sup>6</sup>That perception can be cued in this way may seem obvious, but we still lack an understanding of how a single perceptual hint can induce a long-term change in the ability to perceive stimuli like the Mooney image in **Figure 2H** as meaningful.

one might suppose that it depends simply on the amount of light reflected by a surface (i.e., it's luminance). That this is not so can be plainly seen in an illusion like the Adelson (1993) Checkerboard in which two surfaces with the same luminance are perceived to have very different brightness.<sup>7</sup>

In this last section I will attempt to explain why demonstrations of CPP tend to be less compelling than conventional visual illusions.<sup>8</sup> I then provide a recipe for creating phenomenologically compelling demonstrations of CPP and show two examples.

What makes Adelson's Checkerboard so compelling as an illusion is that it is possible to prove to the observer that it is indeed an illusion by making the perceived difference in lightness to vanish right before the person's eyes by, for example, joining the two patches or masking the context thus allowing observers to see that their perception of one patch as being much lighter than the other was being produced by factors other than their luminance. Compared to this level of control we have over factors that induce such illusions, our ability to control the cognitive state of the viewer is far more limited. For example, consider the finding that an objectively achromatic picture of a banana looks yellower than a meaningless color patch (Hansen et al., 2006). If perceived color is truly influenced by knowledge of the object's canonical color (i.e., reflects our memory of previous experiences with the object), then turning off one's knowledge that one is looking at a banana should affect perceived color. That would make for a compelling demonstration! But it's not possible to turn knowledge on and off in this way. So what can we do instead?

One solution is to manipulate the strength of the association between the input stimulus and stored representations.<sup>9</sup> In **Figure 2B**, people readily construct a 3-dimensional representation of a 2-dimensional image when it corresponds to a recognizable object, but not when its low-level features are rearranged into a novel image (Moore and Cavanagh, 1998). The difference in 3-dimensional structure is apparent, but the two stimuli are too different from one another to allow for easy comparison. This has the effect of reducing the "wow" factor because to the viewer it just appears that one of the stimuli is 3-dimensional and the other is not. It does not feel like the difference is caused by one's knowledge. A method that further minimizes physical changes to the sensory inputs while attempting to manipulate knowledge is simple image rotation (e.g., **Figures 2F,G**, Peterson, 1994). Turning an object upside down maintains all of its low-level visual properties, but weakens its association with a stored higher-level representation (assuming that the object or scene is typically encountered in a canonical orientation). Another way to manipulate knowledge is through cuing. For example, cuing people with an object's name can enhance the contribution of prior knowledge on perception (Lupyan, 2012; Boutonnet and Lupyan, 2015). A cue can help bias one interpretation over another of ambiguous objects of the kind shown in **Figure 2**. In instances like **Figures 2C,E,H**, it can even introduce new interpretations. But almost by definition, such ambiguous objects tend to be lousy examples of the cued categories. Although our phenomenology of **Figure 2H** is arguably different when we perceive the trumpet, the change is not nearly as phenomenologically compelling as the best visual illusions because the change from a collection of meaningless contours to a collection of contours making up a sketchy outline of a trumpet is too small to elicit a "wow." The situation is somewhat better in **Figure 2C** because the alternative made accessible by "there is a letter here" cue explains more of the unexplained variation.

To maximize the "wow factor" would require a stimulus that is easily seen as one thing and then, provided the right cue, can be seen as a good example of something else. In the language of predictive coding, the initial stimulus ought to yield low surprisal, but following a cue the surprisal should increase causing the visual system to reorganize the image into a new percept with low surprisal. Two such cases are shown in **Figure 4**.

**Figure 4A** (Plait et al., 2016) shows an apparently perfectly normal brick wall. There does not appear to be anything ambiguous or atypical about it. But being informed of an alternative interpretation changes that. The new interpretation (see endnote), makes the original interpretation a poorer fit to the data (i.e., increases its surprisal) while simultaneously making the new interpretation a better fit to the data. And so, on learning of the new interpretation, our percept is altered. I find it next to impossible to now see the image as I initially saw it (interestingly, rotating the image seems to partly disrupt the effect of the newly acquired knowledge).

**Figure 4B** (Krishna, 2016) is another compelling demonstration. Here, people appear to be split on what they initially see (see endnote for the description of the two interpretations). But perhaps because the two interpretations differ considerably in how they account for what is happening with the two legs, and because they both interpretations offer such good, but mutually exclusive accounts of the sensory data, the resulting phenomenological switch when one is cued to the alternative (or discovers it on their own) tends to be more compelling than in the cases of bistability shown in **Figure 2**.

# SUMMARY AND CONCLUSION

Perhaps the simplest way to test the proposition that what we know influences what we see is to find cases where the same sensory input can be perceived in different ways depending on one's cognitive state, such as what one knows or expects. Findings that attention—a cognitive process—has strong influences on every aspect of perception would seem to provide prima facie evidence for cognitive penetrability of perception (CPP). Yet,

<sup>7</sup>The Adelson checkerboard illusion can be viewed at http://web.mit.edu/persci/ people/adelson/checkershadow\_illusion.html

<sup>8</sup>For brevity and to maintain focus on attention, I have avoided discussing the argument that perception is cognitively impenetrable because knowing about an illusion does not (necessarily) make it go away. For discussion, see Lupyan (2015a, Section 5.1) and Lupyan (2017a, Section 6.3).

<sup>9</sup>This leads to the prediction that a more realistic banana should activate our color knowledge to a greater degree than a less realistic banana (a point frequently lost in philosophical treatments of CPP that tend to think of knowledge as all or none, e.g., Deroy, 2013). Indeed, the memory color effects are stronger when a viewer is presented with a more realistic grayscale image (Olkkonen et al., 2008; see also Lupyan, 2015b for an effect of weakening associations by turning the image upsidedown on the perceived vividness of color afterimages).

critics of CPP have discounted attentional effects, arguing that they either reflect post-perceptual selection among fully realized perceptual representations, or pre-perceptual processes that change the input to perception but not perception itself (**Figure 1**). I have argued that although some attentional effects may well be post-perceptual, others are clearly not (Sections "Does Attention Really Affect What We See?" and "Attention as a Post-Perceptual Process"). Some types of spatial attention may indeed be similar to genuine changes in input: attending to the left may be similar to looking to the left in that both improve processing of whatever is on the left regardless of content or the cognitive state that drove the attentional shift. Such attentional effects lack semantic coherence and critics are correct to exclude them from counting as examples of CPP. Other attentional effects, however, do show semantic coherence in that the attentional state is sensitive to content (see The Role of Attention and Knowledge in the Perception of Simple Ambiguous Figures) and so should count as genuine instances of CPP. In Sections "The Role of Attention and Knowledge in the Perception of Simple Ambiguous Figures," "," and "Cuing perception: Attention as a mechanism by which knowledge affects perception," I discussed cases where the same (or similar) visual inputs are perceived differently depending on the observer's knowledge (**Figure 2**), and the ability to cue knowledge using both perceptual and nonperceptual linguistic cues (**Figure 3**). I then discussed some of the reasons why it is often difficult to experience knowledge and cues affecting perception in a phenomenologically compelling way (see The "wow" Factor: When Can We Really See Our Knowledge Impacting Perception?). Lastly, I provided some arguably compelling examples of being able to see for oneself how knowledge can affect what one sees (**Figure 4**).

Taken together, the evidence licenses several conclusions. First, it is not possible to characterize attentional effects as non-semantic changes in input of the kind that occur when we look at one location versus another. Rather, attention can and often does operate over dimensions that we normally think of as reflecting meaning and these attentional effects should be counted as genuine instances of CPP. Second, the possibility of exogenously cueing one's knowledge in real time to bias how something is perceived strongly suggests that under normal circumstances what we see is reflecting our endogenous cognitive state. Third, to understand why these effects often lack the "wow factor" common to the best visual illusions, it is useful to work through the effects through the lens of predictive coding. Knowledge ought to change what we see to the extent that it provides a better hypothesis of the sensory data.

## ETHICS STATEMENT

The study described in Section "The Role of Attention and Knowledge in the Perception of Simple Ambiguous Figures" was exempted by the University of Wisconsin–Madison Social and Behavioral Science IRB owing to the anonymity of the participants recruited via Amazon Mechanical Turk and the minimal risk posed by the task of reporting their perception of ambiguous figures.

# AUTHOR CONTRIBUTIONS

The author confirms being the sole contributor of this work and approved it for publication.

# ACKNOWLEDGMENTS

This work was partially funded by NSF PAC 1331293 to the author and was prepared while the author was in residence at the Max Planck Institute for Psycholinguistics.

alternative interpretations of each image.

<sup>10</sup> The "crack" in the wall is not a crack. It is the tip of a cigar which is stuck into the wall, extending outward.

<sup>11</sup> Many viewers will see two shiny legs because the combination of visual cues is consistent with an interpretation of skin gloss. But there is an equal or better interpretation: the legs are covered with streaks of white paint.

# REFERENCES

fpsyg-08-00553 April 27, 2017 Time: 15:27 # 13



Witzel, C. (2016). An easy way to show memory colour effects. Iperception 7, 1–11.


**Conflict of Interest Statement:** The author declares that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

Copyright © 2017 Lupyan. This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) or licensor are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.

fpsyg-08-00553 April 27, 2017 Time: 15:27 # 15

# Cognitive Penetration and Attention

#### Steven Gross\*

Department of Philosophy, Johns Hopkins University, Baltimore, MD, USA

Zenon Pylyshyn argues that cognitively driven attentional effects do not amount to cognitive penetration of early vision because such effects occur either before or after early vision. Critics object that in fact such effects occur at all levels of perceptual processing. We argue that Pylyshyn's claim is correct—but not for the reason he emphasizes. Even if his critics are correct that attentional effects are not external to early vision, these effects do not satisfy Pylyshyn's requirements that the effects be direct and exhibit semantic coherence. In addition, we distinguish our defense from those found in recent work by Raftopoulos and by Firestone and Scholl, argue that attention should not be assimilated to expectation, and discuss alternative characterizations of cognitive penetrability, advocating a kind of pluralism.

Keywords: cognitive penetration, attention, perception, top–down, expectation

# INTRODUCTION

#### Edited by:

Athanassios Raftopoulos, University of Cyprus, Cyprus

#### Reviewed by:

Duncan Guest, Nottingham Trent University, UK John Zeimbekis, University of Patras, Greece

> \*Correspondence: Steven Gross sgross11@jhu.edu

#### Specialty section:

This article was submitted to Cognitive Science, a section of the journal Frontiers in Psychology

Received: 11 August 2016 Accepted: 06 February 2017 Published: 22 February 2017

#### Citation:

Gross S (2017) Cognitive Penetration and Attention. Front. Psychol. 8:221. doi: 10.3389/fpsyg.2017.00221 What we think can affect what we see. For example, if you want some chocolate and think it is to your left, you might turn to look that way. What you now see will differ from what you saw before: a half-eaten bar on the counter rather than the empty cupboard. In this case, a kind of attention plays a mediating role: what you think causes you to change the orientation of your gaze, which in turn has obvious effects on what you see.

What does this show about the relation between cognition (that is, higher cognition, or conception) and perception? Certainly, it shows that there are ways the former can causally affect the latter. Does it show something more significant? Might it require a reconception of cognitive architecture—perhaps even call into question the distinction between perception and conception? Might it deprive us of a theory-neutral basis for adjudicating among competing hypotheses, or undermine perception's apparent role in providing independent warrant to beliefs?

At least for the case at hand, this seems unlikely (however, interesting it may otherwise be). Because cognitive effects on perception via bodily movement are both unsurprising and indirect, it is unclear how they might challenge or reshape the distinction between seeing and thinking. Because this sort of attention can be so readily redirected, it is not obvious how it might render potential evidence inaccessible when comparing theories. And because it seems to filter information (for example, having you look here, not there) but not alter that which it selects, it would seem only to constrain the basis for one's beliefs, not to affect the epistemic relevance of what one does see.

But the case at hand is particularly unsubtle. In other cases, one's eyes can move in perceptually consequential ways without one's realizing it, even upon reflection: eye-tracking was required to demonstrate the role of saccades in flipping ambiguous images (e.g., Stark and Ellis, 1981). Moreover, attention is not limited to overt attention (the reorientation of gaze through bodily movement). Even with one's gaze fixed, covert attention can shift among locations, features, and objects. In such cases, the implications for perception's epistemic function and the perceptioncognition relation are less clear and more controversial. One example: when attentional effects on perception are less obvious, so are their potential biasing effects on belief—whether malign (the neglect of contrary evidence in confirmation bias) or beneficial (when attention prevents us from missing what is most relevant). Another example, which will play a larger role in our discussion: because the mechanisms of covert attention seem more bound up with perceptual processing itself, a cognitive influence upon them appears to amount to a direct effect on perception in a way that cognition's effects upon overt attention do not.

Questions concerning cognition's bearing on perception are often framed in terms of the cognitive penetrability or impenetrability of perception. Applied to our topic, the question is thus whether cognitively driven attentional effects on perception can amount to cognitive penetration. But talk of cognitive penetration gets cashed out in various ways, so that the answer depends on just what cognitive penetration is supposed to be.

In what follows, we approach the question using Zenon Pylyshyn's characterization of cognitive penetration. It was he who coined the term, but, more importantly, his conception is well-motivated, as we will indicate in a moment. A further justification for our focus is that, although much subsequent discussion has centered on Pylyshyn's claims, his resources for precluding attentional effects from the purview of cognitive penetration have not been fully explored or exploited either by Pylyshyn himself or his other defenders.

Pylyshyn's concern is the degree to which visual perception is "continuous" with cognition. More specifically, he asks whether early visual states interact with cognitive states in the way cognitive states do with one another—in particular, by mirroring rational relations. Early vision would be cognitively penetrated, on his view, if "the function early vision computes is sensitive, in a semantically coherent way, to the organism's goals and beliefs" (Pylyshyn, 1999a, p. 343). With this characterization in hand, he argues that cognitively driven attentional effects, though they provide the primary means by which cognition affects perception, do not amount to cognitive penetration. Indeed, showing that various phenomena offered in evidence of cognitive penetration in fact involve subtle attentional effects is among his principle strategies for rebutting others' claims. Thus, he counters Churchland's (1988) discussion of cognitive effects on the perception of ambiguous figures by adducing the evidence mentioned above for the role of eye movement in bringing about perceptual flips.

We agree that, when cognitive penetration is understood in Pylyshyn's way, cognitively driven attentional effects on perception do not amount to cognitive penetration—but not for the reason Pylyshyn emphasizes. Pylyshyn maintains that attentional effects occur either before or after early vision and thus do not directly affect early vision itself. Critics have focused on this claim, replying that in fact attention is bound up with perceptual processes at all levels. This in part accounts for a rising tide of attention-based cognitive penetrability claims (e.g., Lupyan, 2015; Mole, 2015; Wu, forthcoming). But there are other bases for excluding attentional effects from the purview of cognitive penetration. In particular, cognitively driven attentional effects fail to satisfy the requirement that there be semantically coherent sensitivity to cognitive states—or so we shall argue. Along the way, we differentiate our defense from those found in recent work by Raftopoulos (2009) and Firestone and Scholl (2015, 2016); and we respond to views that would assimilate attention to expectation and thereby argue that Pylyshyn's criterion can be met.

But are the significant questions best framed in Pylyshyn's terms? We conclude by considering other conceptions of cognitive penetration advanced in the literature, some of which do and some of which do not count cognitively driven attentional effects on perception as cognitive penetration. We consider as well how one might decide among them. In the end, we advocate a kind of pluralism, suggesting that there may be no one question of cognitive penetrability, but a variety of interesting successors and so no one answer to the question concerning attention with which we begin. Pylyshyn's conception is motivated, but others may be as well. Of course understanding the various ways cognition and perception interact and their upshot is more important than determining if there are phenomena worthy of Pylyshyn's label. But this conclusion does not undermine the interest of our earlier exploration of attention and cognitive penetration as Pylyshyn defines it: first, it is among the various interesting questions; and, second, considering questions of cognitive penetrability, including what cognitive penetrability should be, is a useful strategy for delineating the various interesting questions, even if it is a ladder one then throws away.

# ATTENTION AND COGNITIVE PENETRATION IN PYLYSHYN'S SENSE

To see why Pylyshyn holds that cognitively driven attentional effects on perception do not amount to cognitive penetration (and how else it might be defended), we should first clarify his conception of cognitive penetration. A few remarks concerning the relevant kind of attention will also prove useful.

# Pylyshyn's Conception of Cognitive Penetration

Roughly, cognition penetrates perception just in case it causally affects perception in the right kind of way (that is, subject to some sort of further constraints on the kind of causal effect). But views vary as to what counts as cognition, perception, and causing in the right kind of way. These differences matter for whether attentional effects can count as cognitive penetration. Pylyshyn, we saw, is concerned with whether "the function early vision computes is sensitive, in a semantically coherent way, to the organism's goals and beliefs." As he also puts it, cognitive penetration requires that early vision "can be altered in a way that bears some logical relation to what the person knows;" an instance of cognitive penetration must "alter the contents of perceptions in a way that is logically connected to the contents of beliefs, expectations, values, and so on" (Pylyshyn, 1999a, p. 343). The relevant cognitive states—the admissible source of would-be cognitive penetration—thus comprise for Pylyshyn the so-called propositional attitudes. The relevant target of his impenetrability claim is not perception tout court, which he

fpsyg-08-00221 February 21, 2017 Time: 16:45 # 2

claims is cognitively penetrable (Pylyshyn, 1999a, p. 344), but just so-called early vision, a substantial portion of the perceptual processes implicated in visual perception. It is question just what early vision comprises. Pylyshyn mentions, for example, the calculation of stereo, motion, size, and lightness constancies (Pylyshyn, 1999a, p. 344). But we need not pursue the matter, since the considerations we ultimately adduce in Pylyshyn's defense do not rest on a particular conception of, and are not limited to potential effects on, early vision.

Finally, for the causal effect of cognition on early vision to count as the "right kind," the representational contents of the cognitive states and of the affected early vision states must be related in a way that satisfies two conditions. First, early vision must itself have "access" (Pylyshyn, 1999a, p. 344 and passim) to the cognitive states. The cognitive states must exert their influence because early vision's computations take their contents into account by operating over them, not just because the cognitive states have effects on other states over which early vision computes. In this sense, the influence must be direct. Second, the contents of the cognitive states and the contents of the affected early vision states must stand in a relation of semantic coherence—or, as he also puts it, a logical or rational relation:

We sometimes use the term "rational" in speaking of cognitive processes or cognitive influences. This term is meant to indicate that in characterizing such processes we need to refer to what the beliefs are about—to their semantics. The paradigm case of such a process is inference, where the semantic property truth is preserved. But we also count various heuristic reasoning and decision-making strategies (e.g., satisficing, approximating, or even guessing) as rational because, however, suboptimal they may be by some normative criterion, they do not transform representations in a semantically arbitrary way: they are in some sense at least quasi-logical. This is the essence of what we mean by cognitive penetration: it is an influence that is coherent or quasi-rational when the meaning of the representation is taken into account (Pylyshyn, 1999a, p. 365, fn. 3).

These formulations raise further questions, but the basic idea of one representation not just causing another, but providing a reason for it, will suffice for our purposes.<sup>1</sup> The requirements of directness and semantic coherence articulate the kind of "sensitivity" cognitive penetration requires. It is not enough that the contents of early vision states be sensitive to those of cognitive states in the weaker sense of depending counterfactually, statistically, or in a law-like manner upon them. They must also do so in virtue of early vision itself operating over the cognitive states in a manner that mirrors a rational relation.

Not all parties to cognitive penetrability disputes characterize the would-be phenomenon in this way. In particular, some drop the requirement of directness, and many drop the requirement of semantic coherence. We canvass some of these alternatives below. For now, we underscore Pylyshyn's motivation. Pylyshyn is interested in whether vision and cognition are "continuous," as New Look psychology suggests (Bruner, 1957). A central feature of propositional attitudes is that they do directly affect one another in semantically coherent ways. Indeed, their availability for rational inference about what to believe and what to do—and the conceptual structure this imposes upon them—is among their most important functional features. If perceptual states—more specifically, states of early vision—interacted with propositional attitudes in a similar way, this would be a strong argument for a crucial continuity with them. If they do not, it is a crucial discontinuity. Establishing Pylyshyn's thesis thus helps mark and preserve at least an aspect of the perception-cognition distinction itself. It is important to note, however, that the (not necessarily exhaustive) distinction need not rest solely upon this discontinuity. For example, one might also differentiate perceptions and cognitions by their relative stimulus-dependence—more specifically, whether it is their function to represent the here and now (cf. Pylyshyn, 1999a, p. 343; also Burge, 2010). (Pylyshyn, 2002, however, rejects one oft-proposed basis for drawing a perception/conception distinction: that the former have iconic and the latter symbolic representational formats.)

### Kinds of Attention

Attention comprises a variety of phenomena and is perhaps something of a motley (Allport, 1993). But we can pinpoint, or at least minimally clarify, what kind of attentional effect is at issue here.

Attentional phenomena sub-divide in various, sometimes cross-cutting ways. Attention can be external (selecting and modulating sensory information) or internal (selecting, modulating, and maintaining memories, choices, responses, and other non-sensory representations) (Chun et al., 2011). Our focus is of course external attention, since we are considering a candidate case of the cognitive penetrability of perception. As mentioned, if external attention involves the movement of sensory organs, it is overt; otherwise, it is covert. It is widely agreed that cognitive effects on perception mediated by overt attention—as with our example of looking to the left because one wants some chocolate and believes that is where it is—should not count as cognitive penetration, because the effect is not direct or because admitting them would render the topic uncontroversial. Thus, our focus is covert attention. Note, though, that cases of pure overt attention shifts may be atypical. For among the hypothesized functions of covert attention is to prepare or guide overt attention shifts, for example by highlighting a target for eye movement or visual search (Kowler, 2011; Nakayama and Martini, 2011). (Another hypothesized function, relevant to social cognition, is to allow undetectable attention shifts—see Laidlaw et al., 2016) Cases of cognitively driven overt attention shifts could therefore involve cognitive penetration—albeit not

<sup>1</sup>This gloss on Pylyshyn's constraint is more demanding than others found in the literature—cf. Stokes, 2013 for discussion—but it finds support in the quoted text. If it is not evident how the content of a cognitive state might supply a reason bearing on that of a perceptual state (as opposed to vice versa), recall that, at least since Helmholtz, it has been common to think of perception as engaged in something like inference: the question then is whether (and, if so, how) cognitive states can contribute to this inference-like process. Models on which they can are discussed below.

in virtue of their overt aspect—if concomitant cognitively driven covert attentional shifts can amount to cognitive penetration.

Cases of covert attention can be classified by what drives them. Exogenous attention is driven directly by external cues in a bottom–up fashion, as when attention is captured by a sudden noise. Endogenous attention is driven from within in a top-down fashion. The top–down processes involved in endogenous attention can, however, occur in response to an external cue, as is typically the case in experimental settings. Endogenous and exogenous cues differ according to whether they must be in some sense understood. For example, whereas an exogenous cue might increase attention to a location by simply occurring there, an endogenous cue might do so by indicating that location via an arrow or by a symbolic description ('up'). Because endogenous cues must be understood, they achieve their attentional effects by engaging mechanisms and cortical areas different from those required for exogenously driven attentional effects; and there is a corresponding difference in time-course: 300 ms from endogenous cue to attention shift, compared to a 100–120 ms peak for exogenous cues (Carrasco, 2011). However, that a cue generates an endogenous shift in attention might not of itself entail that the shift is cognitively driven in Pylyshyn's sense. It is possible that the relevant representation or association (e.g., of an arrow and a direction) is contained within perception itself, or comes to be over a course of trials (Pratt and Hommel, 2003; Stevens et al., 2008; Pratt et al., 2010). This is just a particular instance of a point proponents of cognitive impenetrability have always emphasized: that top– down does not entail cognitive, since top-down processing can occur within perception itself (Fodor, 1983; Pylyshyn, 1999a). That said, cognitively driven attention is of necessity endogenous. So, we will only be concerned with it.

Raftopoulos and Lupyan (2017), in laying out the research topic to which this paper is a contribution, make special mention of pre-stimulus cues. It is thus worth noting that the only role external cues play in endogenous attention is to generate the internal states that then cause the attention shift. In that sense, the cue's role is indirect and essentially irrelevant once its job is done. Perhaps, then, there is no special question of whether cognitively driven attentional effects on perception brought on by pre-stimulus cueing (as opposed to, say, an unprompted decision to attend in a certain way) count as cognitive penetration—that is, nothing further to ask beyond whether cognitively driven attentional effects on perception count as cognitive penetration more generally. This turns out to be the case given Pylyshyn's conception of cognitive penetration: what brought about the cognitively driven attentional effect will be irrelevant to the considerations we adduce. (There may be, however, as indicated above, a question whether particular endogenous attentional effects are in fact cognitively driven.)<sup>2</sup>

Finally, cases of covert attention can also be classified by their object—i.e., what one attends to. Work over the last few decades typically distinguishes spatial, feature-based, and objectbased attention (Carrasco, 2011). (For simplicity, we bracket temporal attention, a special case that is arguably not as well understood and perhaps spans the external/internal divide. See Nobre and Coull, 2010; Phillips, 2012; Gross, in preparation). This classification matters for certain arguments mentioned below, but its importance will fade once we focus on our alternative defense of Pylyshyn's claim.

# PYLYSHYN'S ARGUMENT AND ITS CRITICS

Pylyshyn maintains that cognitively driven attentional effects whether overt or covert—do not provide examples of cognitive penetration of early vision. His main argument is that attentional effects either help determine the inputs to early vision or help select from among its outputs.<sup>3</sup> Attentional effects may indirectly affect early vision (even selection effects after early vision may indirectly affect early vision—for example by causing an effect on its subsequent inputs). But because they involve no direct effect on early visual processing itself, they do not exhibit a way "the function early vision computes is sensitive, in a semantically coherent way, to the organism's goals and beliefs."

Critics respond that attentional effects are found at all levels, or stages, of processing. Thus, either they occur in early vision or, if early vision is to be insulated from them, there seems nothing substantial left for early vision to be. Talk of levels can be cashed out in various ways. Yeh and Chen (1999), for example, argue that results finding attentional effects at various cortical levels leave little space for an attentionally insulated stage of

<sup>2</sup>Whether a cognitively driven attentional effect was brought about by pre-cueing could matter for others' arguments. For example, as we will see, Raftopoulos (2009) puts much weight on time-course considerations. Suppose that what brings about a cognitively driven attentional effect matters to the subsequent time-course of that effect on early vision. It would then be of note if, say, attentional effects brought about by pre-cueing occurred within the window of early visual processing, while effects brought about by a decision to attend could not. The importance

of such a difference would be lessened, however, if such effects need not occur "online"—that is during early visual processing—to be relevant to questions of cognitive penetrability, but could instead affect subsequent early visual processing. Raftopoulos and Zeimbekis (2015, p. 23) raise the possibility that, because "precueing does not affect visual processing in a direct, online way, but just sets the initial values of certain parameters for subsequent computations," it does not amount to cognitive penetration. But it is unclear whether computationally there is a substantive difference between directly supplying an input and fixing a parameter (and whether it matters if one happens before the other). Our discussion of semantic coherence might supply a way of cashing out this difference. But then the relevant distinction is not one particular to pre-stimulus cueing. (Note, incidentally, concerning relative timing, that what matters—if it does—would be that the input/parameter fixing from cognition occurs before sensory input from the stimulus, not that the endogenous cue that drives cognition comes before the stimulus.)

<sup>3</sup> I say that this is his main argument—and elsewhere that this is the argument he emphasizes—because, just after providing his characterization of cognitive penetrability, he illustrates it as follows:

Note that changes produced by shaping basic sensors, say by attenuating or enhancing the output of certain feature detectors (perhaps through focal attention), do not count as cognitive penetration because they do not alter the contents of perceptions in a way that is logically connected to the contents of beliefs, expectations, values, and so on, regardless of how the latter are arrived at (Pylyshyn, 1999a, p. 343).

But the rest of his many remarks on attentional effects focus solely on their being prior or posterior to early vision.

early vision. (Cf. Lupyan, 2015, p. 560: "As is now well-known, attention modulates processing at all levels of the visual hierarchy . . .. Prima facie, these findings appear to be devastating for opponents of the [cognitive penetrability of perception] thesis . . ..") Pylyshyn (1999b, p. 410) replies that one cannot assume a straightforward mapping between cortical and computational stages, and it is the latter with which he is concerned. However, subsequent attention research has arguably trended towards a convergence of behavioral and neurophysiological data that shifts the burden onto anyone who would defend Pylyshyn's claim that attentional effects are external to early vision. As Carrasco (2011, pp. 1485–1486) writes in summing up the preceding 25 years of research:

Initially, there was a great deal of interest in categorizing mechanisms of vision as pre-attentive or attentive [i.e., involving selection after early vision]. The interest in that distinction has waned as many studies have shown that attention actually affects tasks that were once considered pre-attentive, such as contrast discrimination, texture segmentation and acuity. . .. this review focuses on the effect of attention on basic visual dimensions where the best mechanistic understanding of attention to date has been achieved, such as contrast sensitivity and spatial resolution [. . . and . . .] motion processing . . .. due to the existence of models of these visual dimensions, as well as to the confluence of psychophysical, single-unit recording, neuroimaging studies, and computational models, all indicating that attention modulates early vision.

Note that an "all levels" claim can affirm that the fundamental function of attention is to cull inputs. This is a natural idea whose mechanisms are becoming better understood. (For example, recent work suggests that attention's primarily function is to select among stimuli and thus reduce the cost of stimulus mixing in cortical response, not to increase response sensitivity or reduce noise via an increase in gain in relevant areas. See Pestilli et al., 2011; Orhan and Ma, 2015). What an 'all levels' claim rejects is just that this culling of inputs does not occur inter alia within early vision. (Note that the inputs to be culled could be sensory inputs or inputs provided by one computational mechanism to the next.) Firestone and Scholl's (2016, pp. 23–24) reply—citing, incidentally, the same Carrasco survey—to the objection that not all attentional shifts are like moving one's eyes would thus cut no ice if directed against the "all levels" complaint:

. . . fundamentally, "attention is a selective process" that modulates "early perceptual filters" (Carrasco, 2011, pp. 1485– 1486, emphasis added). This is what we mean when we speak of attention as constraining input: attention acts as a "filter" that "selects" the information for downstream visual processing, which may itself be impervious to cognitive influence.

If this selection occurs at "all levels"—in particular, at each stage of processing within early vision—it remains the case that no substantial component of visual perception may remain that is "impervious to cognitive influence."

There are indeed ways one might attempt to defend Pylyshyn's claim that attentional effects are external to early vision, either preceding or following it. But their prospects are inessential to our main point: we argue that one can in any event defend on other Pylyshian grounds the claim that cognitively driven attentional effects on early vision do not amount to cognitive penetration. However, because debate has focused on where attentional effects are felt, we provide a brief indication of possible directions one might explore on this front. Theeuwes (2013) argues that all feature-based attention involves bottomup priming. Having perceived a certain feature, one's perceptual system is then primed to perceive it again, regardless of its location or the object that has it. Previous work, he argues, missed this by presenting subjects with blocks of trials that did not control for stimulus history. When stimulus history is controlled for, feature-based attentional effects disappear. If he is right, this removes one candidate category of cognitively driven attentional effects. Recent work indicates that object-based attention is likewise subject to the effects of stimulus history (Lee et al., 2012). A rather hopeful defender might speculate that it too might all be bottom-up. Alternatively, she might pin her hopes on the minority view that apparent object-based attentional effects are really spatial (see Reppa et al., 2012 for references and critical discussion). But, failing that, it may be conceded in any event that object-based attentional effects occur only after early vision. This would leave spatial attention. As we noted, the work of Carrasco and others suggests that attentional effects are entwined with perceptual processing throughout early vision. But Schneider (2006, 2011) attempts to explain their results by positing an attentional effect on salience and a post-perceptual decision bias in favor of salient items, rather than an attentional effect on perceptual content. On this view, salience, though a property of perceptions, is not something itself represented in perception. By affecting salience, attention might have a kind of effect on perceptual processing at "all levels" on such a view, but not the kind relevant to cognitive penetration—viz., an effect on content. It would only have that kind of effect post-perceptually, at the level of perception-based judgment. (See Beck and Schneider, forthcoming for philosophical discussion, replying to Block, 2010.) If these moves (or others) were to pan out, they would vindicate Pylyshyn's claim that attentional effects—or at least the relevant ones—are all external to early vision. Many will consider that a big if. We now argue that the claim it was intended to subserve—that cognitively driven attentional effects on early vision do not amount to cognitive penetration—can be defended in any event. Discussion need not fixate on the locus of attentional effect.

# AN ALTERNATIVE DEFENSE OF PYLYSHYN'S CLAIM

If cognitively driven attentional effects were all external to early vision, that would suffice to show that they are indirect: the function early vision computes would not be sensitive to them in a semantically coherent way. But showing that they are in fact internal does not yet show that Pylyshyn's requirements

on cognitive penetrability are satisfied. Are there grounds for thinking they are not?

It might be thought that it does not matter where in processing the effect is felt: the effect must be indirect simply because it is mediated by attention, so that, even when cognitively driven attention exerts its influence on early vision (not just on inputs to early vision), this involves first cognition affecting attention, which then in turn affects early vision. But Mole (2015) argues that whatever plausibility this thought may have for overt attention, it requires a mistaken picture of covert attention as a faculty or capacity distinct from perception and capable of causally affecting it—as opposed to its just being itself a certain kind of effect in perception.<sup>4</sup> If covert attentional effects are a part of perceptual processing, then, pending the identification of other mediating factors, cognitive effects on perception supposedly mediated by covert attention are direct effects on perception.

The indirectness objection, however, can be pressed in a more subtle way. To see how, consider what occurs when cognitively driven attention affects early vision (not via an effect on sensory input to early vision). The decision to attend can spring wholly from within (we consider such a case below), but in a typical experiment the subject responds to an endogenous cue—what is in effect an instruction from the experimenter. For example, if the cue is an arrow pointing up or the word 'up,' it is an instruction to attend there. If the case is to satisfy Pylyshyn's criteria for being cognitively driven, the subject will come to have an intention or other action-directed attitude to attend there. She will do this on the basis of such other attitudes as her belief that she has been instructed to attend there, her desire to cooperate, etc. Our suggestion is that these attitudes generate what we might call an attentional command to attend there, which—on a causal-computational account—would exert its influence on perceptual processing, affecting perceptual content (at least on the common view suggested by Carrasco's work and others'—we put aside here Schneider's animadversions). If the ascription of this attentional command seems fanciful, consider that it is common for computational models of perceptual attention to include attentional parameters that weight the effect of sensory signals (e.g., Lee and Maunsell's (2009) divisive normalization model, brought to bear on cognitive penetration in Wu (forthcoming)). 'Attend to this, this much' is a natural way to gloss their content. And possess content they must if we are so much as to have a candidate case of cognitive penetrability in Pylyshyn's sense. (In discussing attention and expectation below, we consider models that would dispense with such parameters.)<sup>5</sup>

Now, if the directness aspect of Pylyshyn's criterion is to be satisfied, perception must have access to and operate over relevant cognitive states: the values of cognitive states must be among the inputs to perceptual processes. One way to defend Pylyshyn's cognitive impenetrability claim, then, is to argue that (1) the attentional command is not itself a cognitive representation, and (2), although the attentional command plays a role in perceptual processing, the attitudes that generate it do not. The attitudes that generate the attentional command are thus not accessed, and the attentional command is not cognitive; so, no cognitive state is accessed. Perception, by not accessing the cognitive states themselves, would thus not be in that sense sensitive to them; it would only feel the causal effect of those states. We could still say, with Mole (2015), that there is a sense in which cognition affects perceptual processing in an unmediated way. For the cognitive states, on this view, could directly generate the attentional command, itself a part of perceptual processing. And yet, in another sense, we would have to say that there is mediation after all: for, though the attentional command may be directly generated by cognitive states, the attentional effect on perception would not be. We can think of the effect either as the result brought about or as the bringing about of the result. In the latter sense, the attentional effect consists in the attentional command exerting its influence and thereby affecting perceptual content (the actual transition from one representational state to another, as brought about in part by the attentional command—in other words, the calculation of the function in which the attentional command is a term). In the former sense, it is just the resulting perceptual state (or perhaps some aspect it would otherwise not have had). Either way, the attentional effect would not be directly generated by cognitive states. Moreover, consider the state that is directly affected—the attentional command. Though it is perhaps (if we deny it cognitive status) a representational state in perception, it is not itself a perceptual state, at least in the sense of a state whose function is to represent the here and now. Thus, cognition's direct effect on it does not constitute cognitive penetration of perception.

Perhaps, when all is said and done, this is correct. But it might not be the most convincing way to defend Pylyshyn's position. For it is unclear on what basis one can compellingly persuade a proponent of cognitive penetrability that the attentional command is not a cognitive representation. In particular, to base one's case on the fact that it interacts with perceptual representations would just beg the question: that perceptual processes can have access to cognitive states is precisely what a proponent of cognitive penetration in Pylyshyn's sense claims.

A stronger argument instead incorporates the preceding considerations into a dilemma. For if the attentional command is considered a cognitive state, its influence on perception runs afoul of the semantic coherence constraint—so that cognitive penetrability is blocked whichever status one assigns the attentional command. Recall that the semantic coherence constraint requires that the content of the accessed cognitive state bear an inferential relation to the content of the resulting perceptual state. Attending may exert a causal effect on what one sees. But it does not provide an epistemic basis for it. Here a comparison with turning on a light is appropriate (cf. Firestone and Scholl, 2015, p. 8). Turning on a light—perhaps in response to a request—might enable one to see that there is something red there, but not because the turning on of the light is evidence for it. Matters stand otherwise in cases where, according to Pylyshyn,

<sup>4</sup>Cf. Raftopoulos (2009) and Anderson (2011). Mole (2015) develops the point with reference to Desimone (1998) and Duncan's (1998) biased competition model of attention. Gross (2016) critically discusses some aspects of Mole's arguments. (For someone claiming that attention is a cognitive process, see Lupyan, 2015, p. 560.) <sup>5</sup>On some views, perception-based demonstrative reference in thought requires prior attention to the demonstratum (Campbell, 2002). The apparent circle can be avoided by denying this requirement or by distinguishing among attentional mechanisms (Wu, forthcoming).

the semantic criterion is met. In late vision, he claims, we might draw upon various beliefs to identify some object. If early vision outputs a representation of an object as having such-and-such shape and coloring, etc., we may access any number of beliefs in coming to then represent the object as Ms. Jones—for instance, beliefs about what Ms. Jones, as opposed to other persons and things, looks like (Pylyshyn, 1999a, p. 344). The content of these beliefs do not just cause, but provide an epistemic basis for the resulting representation of this as Ms. Jones. Similarly, for the claimed influence of color memories (that hearts are red, bananas yellow) on color perception (Delk and Fillenbaum, 1965; Macpherson, 2012; Gross et al., 2014; Witzel and Hansen, 2015).

Indeed, intentions and commands are not the sorts of states that can provide reasons in the relevant sense. They do not provide epistemic grounds, though they can be related to reasons for action. To attend is a kind of act (a mental act—cf. O'Brien and Soteriou, 2009); and, unless it is just a whim, when one forms an intention to attend, one does so on the basis of reasons to attend—in the experimental setting, because you are instructed to so attend and want to cooperate. There is thus an appropriate semantic relation between the relevant cognitive attitudes and the attentional command. Arguably, there is also an appropriate semantic relation between the attentional command and the carrying out of what it commands (viz., the mental act of attending itself). But what there is not is a semantic relation between the attentional command (or the attending) and the resulting perceptual state, the state that exhibits the attentional effect. One might have good reason to turn on a light, and one's doing so might cause you see to see a red thing there. But the reasons for turning on the light (viz., because you were asked) do not supply any epistemic basis for what you see—that is, for there being something red there—nor does your turning on the light constitute any such reasons. Just so with your reason for attending and for the attending itself: they causally affect what you see, but are not themselves grounds for it.

This is not to say that the directness requirement of Pylyshyn's conception of cognitive penetration plays no role in turning aside challenges arising from covert attentional effects. To see this, consider the following objection. It might be worried that the argument just given hinges on the kind of case we are considering, where one has simply followed the experimenter's instruction. But perhaps things are otherwise with at least some more internally generated intentions to attend. Suppose one decides to attend on the basis on some belief about what one will see. For example, you need something red to balance out a design and believe something red is over there. Attending there will raise the probability of your getting something red. So, you attend there and, as a result, see something red there. Here we seem to have an appropriate semantic relation between a cognitive state (the belief that something red is there) and the resulting perception (a visual representation as of something red there).

But even though the belief in part causes the perception and their contents seem to stand in an appropriate relation, it is not the case that the perceiver or her visual system treats the belief as evidence for what she sees. This point can be developed in terms of directness: the visual system does not itself access, and so does not take into account, the person's belief that something red is there; it is just influenced by the action command that in part results from the belief, given how one reasoned what to do. Moreover, the worries raised above about appealing to directness do not apply here: it is contentious to deny that the action command is a cognitive state, but less so to deny that the belief that helps generate the action command in a case like this is not itself accessed by early vision—at least insofar as the belief's attentional effect is concerned (we return to this caveat in discussing attention and expectation below).

Perhaps one may argue that in fact the semantic coherence constraint, properly understood, is also not satisfied in this case. Consider the distinction between one claim's providing a reason for another and one claim in fact being the reason for which someone upholds a claim. (For example, it might be that A entails B, and one believes both A and B, but one does not believe B on the basis of A because one does not realize that A entails B—nor does one, or one's reasoning capacity, otherwise encode or "embody" the entailment.)<sup>6</sup> If we may apply an analogous distinction in theorizing about the visual system, then we might suggest that semantic coherence requires not just that the cognitive state in part cause the perception and that as a matter of fact there be a semantic relation between their contents, but that the content of the former be part of the basis upon which the content of the latter is generated. This might seem to eliminate the need to advert to directness in replying to such cases after all. But whether this is so depends on what precisely providing a basis requires. A natural cashing out would require directness: being able to access and operate over the cognitive state. If so, this formulation of semantic coherence would simply build in directness. Note though that, even if semantic coherence were construed broadly to include directness, it would not collapse the two constraints: semantic coherence would still go beyond directness. What we have seen is that attentional parameters in visual computations provide an example of how representations can be accessed and operated over without their role in the computation being appropriately inference-like in Pylyshyn's sense. Otherwise put, they show that, even though a computational transition might itself be deemed an inference, or inference-like, not all elements of the computation need be (quasi-)reason-giving. Attentional weights affect computations in a different way.

Let us take stock. We suggest that, for cognitively driven attentional effects on perception to amount to cognitive penetration, there must be propositional attitudes that generate an attentional command. The attentional command finds a place in computational models of perceptual processing along the following highly schematized lines: f(s, a) = p—where 's' represents the sensory signal, 'a' the attentional command (attentional weighting), and 'p' the resulting perceptual state.<sup>7</sup>

<sup>6</sup> I am bracketing various further nuances. For example, A might be a non-entailing reason for B, but one may not uphold B on the basis of A, not because one does not realize that A is a reason for B, but because other, overall stronger considerations lead one to deny B.

<sup>7</sup>Of course, less schematized models can allow sub-transitions from perceptual states, weighted by attention, to further perceptual states; inputs from other sources; probabilistic states; and many other complications. And they will unpack f.

This introduces a variety of candidate loci for cognitive penetration, in part depending on whether the attentional command is a cognitive state or not. The candidates are: the attitudes' effect on the command, the command's effect on the perception, and the attitudes' effect on the perception. But the attentional command does not stand in the appropriate semantic relation to the resulting perceptual state for it to penetrate perception. And if one maintains further—albeit contentiously that the attentional command is not a cognitive state, then it is not even a candidate source of penetration. The relation of the generating attitudes to the attentional command does satisfy the semantic criterion. But this is irrelevant if the attentional command is itself a cognitive state and so not a candidate object of penetration. And if the attentional command is not a cognitive state, still, it is not a perception (and thus not a candidate object of cognitive penetration), even if it is a representation in perception in the sense of being operated over in perceptual processing. Finally, though the attitudes that generate the attentional command can sometimes have reasongiving content relative to the resulting perceptual state, in such cases the directness requirement is violated, since perceptual processing, so far as the attentional effect is concerned, does not access—in that sense is not sensitive to—the content. (And this might violate semantic coherence as well on a broad construal.)

Cognitively driven attentional effects on early vision thus do not provide examples of cognitive penetration in Pylyshyn's sense. But not for the reason he emphasizes. Even if there are attentional effects that do not occur before or after early vision, they still fail to satisfy Pylyshyn's requirements—either of directness or of semantic coherence.

## COMPARISON WITH OTHER DEFENSES

We can sharpen these points by differentiating our defense from Raftopoulos's (2009) and Firestone and Scholl's (2015, 2016) defenses of similar positions.

# Raftopoulos

Raftopoulos (2009) argues that perception—by which he means early vision sans sensation—is not cognitively penetrated. Drawing in part upon Lamme and colleagues (e.g., Lamme, 2003), Raftopoulos argues that early visual processing culminates around 120 ms after stimulus onset, following a feed-forward sweep that leads to the establishment of locally recurring networks; post-perceptual cognitive processes involve later topdown feedback from higher cortical areas. His emphasis on timecourse provides one way of supporting Pylyshyn's suggestion that computational stages do not necessarily line-up with location in the cortical hierarchy, since later temporal stages reuse areas implicated in earlier stages. But, contra Pylyshyn on attention, Raftopoulos adduces evidence that within this time-frame, one finds cognitively driven attentional effects, stemming from prestimulus cueing, upon early vision.

Raftopoulos argues that, nonetheless, these attentional effects do not constitute cognitive penetration. For, though they facilitate processing, they do not affect the resulting perceptual content (e.g., Raftopoulos, 2009, p. 83). This claim seems in tension with his later suggestion that such attentional effects constrain the interpretation of ambiguous figures (Raftopoulos, 2009, pp. 294–295).<sup>8</sup> But, in any event, it also commits him to rejecting Carrasco's interpretation of her results as showing attentional effects on content in early vision—whether he might reject it on Schneider's grounds or some other. Our arguments require no such commitment.<sup>9</sup>

Note also that Raftopoulos (2015), like Pylyshyn, argues that later vision is indeed cognitively penetrated. But their arguments differ. Pylyshyn, as we saw, adduces cases where cognitive states are accessed in attributing further features. While some of Raftopoulos' arguments take this form, Raftopoulos also adverts to attentional effects not involving access to other cognitive states (Raftopoulos, 2015, pp. 283–284). Our defense of Pylyshyn would also preclude such attentional effects in later vision from counting as cognitive penetration. Cases, however, where attention is what facilitates the access of relevant cognitive states are another matter—see Raftopoulos (2015, p. 284f). Nor, more generally, would our considerations speak to the non-attention-centered arguments that Raftopoulos, like Pylyshyn, advances.

# Firestone and Scholl

Firestone and Scholl (2015, p. 36) appear to agree with our reply on Pylyshyn's behalf when they argue that at least some covert attentional effects on perception "may be occasioned by a relevant intention or belief, but they are not sensitive to the content of that intention or belief." (They limit their scope to "many" such effects, allowing that there may be other more "rich and nuanced" cases not covered.) But consider how they argue for their claim:

A critical commonality [with overt attention or turning off the lights], perhaps, is that the influence of attention (or eye movements) in such cases is completely independent of why you attended that way. Having the lights turned off will have the same effect on visual perception regardless of why they were turned off, including whether you turned them off intentionally or accidentally; in both cases it's the change in the light doing the work, not the antecedent intention. And in similar fashion, attention may enhance what is seen regardless of the reasons that led you to deploy attention in that way, and even whether you attended voluntarily or via involuntary attentional capture; in both cases, it's the change in attention doing the work, not the antecedent intention (Firestone and Scholl, 2015, pp. 35–36).

Firestone and Scholl's main point, translated into our terms, is that it is the attentional command that does the

<sup>8</sup> Some of Raftopoulos' other remarks (e.g., Raftopoulos, 2009, p. 322) likewise suggest that spatial attention does affect perceptual content, but only in virtue of selecting what signals get (fully) processed, not in virtue of any further effect on processing. He deems such effects indirect, but it is not clear in what sense, since he allows effects of spatial attention within early vision.

<sup>9</sup>Raftopoulos' view that attention only facilitates processing without affecting perceptual content is in part buttressed by his view that perception delivers "rich" iconic representations. For some challenges to the evidence in favor of "rich" over "sparse" perceptual representations in vision, see Gross and Flombaum (forthcoming).

work, regardless of what generated it. This, we have seen, justifies the conclusion that propositional attitudes that generate an attentional command do not satisfy Pylyshyn's directness requirement. So, we indeed agree with Firestone and Scholl's conclusion to this extent. But their discussion is incomplete; the directness constraint cannot by itself do all the necessary work.

Note first that Firestone and Scholl's talk of attention's influence on perception might suggest that, even in the covert case, we should conceive of attention as a faculty or capacity distinct from perception. But, similarly to Mole (2015), Firestone and Scholl (2016, p. 24) also write: "Our project concerns the 'joint' between perception and cognition, and attention unquestionably belongs on the 'perception' side of this joint". The apparent tension vanishes if we import our distinction between the attentional command and the attentional effect. But doing so also helps us see that Firestone and Scholl have left undone some of the work we undertook above. At the risk of repetition, let us review how this plays out. If attention is located on the side of perception, then one might argue the generation of an attentional command from propositional attitudes is itself a direct effect of cognition on perception. Now we have a choice point. If we allow that the attentional command, and not just the attentional effect, is indeed located on the side of perception, then we need to argue that, though the attentional command is a representation in perception, it is not itself a perception. If we rather place the attentional command in cognition, then it is not even a candidate target of penetration and clearly the directness of its relation to the generating attitudes is irrelevant. But then we must ask about the relation between the attentional command and the resulting effect. Here directness is not the issue, but rather semantic coherence—and this constraint is absent from Firestone and Scholl's argument.

It may seem otherwise, since they speak of sensitivity to content. But reflection on their argument makes it clear they have in mind directness, not semantic coherence. Consider a case where I in fact come to believe X by inferring it from Y, but I could have acquired belief X via hypnosis. That I could have acquired the belief in another way does not change the fact that I actually acquired it via a cognitive state that provides a reason in its favor. Similarly, as we saw, the attitudes that generate an attentional command can likewise satisfy semantic coherence (on its narrow construal) relative to the resulting perceptual state. That the same attentional command could have been otherwise generated is irrelevant, so far as semantic coherence is concerned. But these alternatives do matter for establishing indirectness, which is thus what Firestone and Scholl's focus on the irrelevance of what caused the attentional change must be about. So, there is a distinction (between attentional commands and effects) and a further requirement for cognitive penetration (semantic coherence) that Firestone and Scholl omit.

#### TWO OBJECTIONS

We conclude by replying to two objections. According to the first, attention should be assimilated to expectation; and, once it is, cognitively driven attentional effects, recharacterized in a Bayesian framework, seem to satisfy Pylyshyn's requirements. According to the second, we need not stick with Pylyshyn's characterization of cognitive penetrability in any event; and, on some other, well-motivated conceptions, cognitively driven attentional effects can indeed amount to cognitive penetration.

#### Attention as Expectation

Above, we considered the objection that covert attention does not involve a distinct faculty intermediate between cognition and perception. We responded by arguing that, nonetheless, one may distinguish between an attentional command and attentional effects—adding that attentional weightings, common in computational models, are naturally construed as attentional commands. This restored a notion of attentional cause without reifying an attentional faculty.

It may be replied that this does not take sufficiently seriously the claim that attentional effects are a by-product of perceptual processing and do not involve attentional causes at all (Anderson, 2011; Vincent, 2015). The 'by-product' claim is often developed within a Bayesian framework that treats attentional effects as resulting from expectations (Dayan and Zemel, 1999). On the Bayesian approach, perception solves the problem of inferring the distal scene from noisy, ambiguous sensory signals by performing, or approximating, a Bayesian inference that balances the likelihood of a sensory signal, given a candidate distal cause, and the prior probability of that cause. To say, in this framework, that attentional effects result from expectations is thus to say that observed attentional effects can be accounted for in terms of the priors perception brings to bear in inferring distal causes (e.g., Rao, 2005).

If all attentional effects could be accounted for in this way, the model would require no specifically attentional parameters. For example, rather than a command to attend this much to this location, there might be an increased expectation that the target will be there. Moreover, not only would the expectation cause the attentional effect, it would do so because perception would take it into account in (quasi-) inferentially generating its output.<sup>10</sup> Replacing attentional commands with expectations would thus both remove the barrier to directness and guarantee the satisfaction of semantic coherence.

This would not settle all questions concerning cognitive penetration. First, one could still ask whether the accessed expectations, particularly in cases where the effect was on early vision, were in fact cognitive states (beliefs about the future). Second, given recent debates concerning the intended or appropriate Marrian level of Bayesian models (e.g., Bowers and Davis, 2012a,b; Griffiths et al., 2012—and cf. Marr, 1982), one might attempt to reinstate a directness challenge elsewhere. Questions of cognitive penetrability are arguably posed at

<sup>10</sup>In effect, perception says: "This sensory signal is difficult to interpret: it could be caused by a variety of things. But the expectation of there being something red there gives me some reason to think it is more likely the signal was caused by something red than by something blue. So, let us go with that." (We add 'quasi' above in deference to those who reserve the term 'inference' for relevant operations over conceptual representations—e.g., Burge, 2010.)

the algorithmic level, but Bayesian models are sometimes put forward as computational-level claims. If so, the question remains open whether at the algorithmic level early vision directly accesses cognitive expectations—or whether the effects of cognitive expectations are rather mediated by effects, say,

on imagery (Macpherson, 2012; Block, 2016—though see Gross et al., 2014) or on visual working memory.

fpsyg-08-00221 February 21, 2017 Time: 16:45 # 10

But, in any event, there is an antecedent problem: attention in fact dissociates from expectation (Summerfield and Egner, 2009, 2016; Summerfield and de Lange, 2014). For example, endogenous cues can direct attention even when they are uninformative about the target. Moreover, neurophysiologically attention is associated with enhanced neural response, while expectation is associated with reduced neural response (Yoshiura et al., 1999). Bayesian models without attentional parameters only handle phenomena where attention and expectation coincide; Bayesian models that attempt to address the dissociation tend to reintroduce attentional parameters (e.g., Whiteley and Sahani, 2012).<sup>11</sup>

It may be replied that this defense of Pylyshyn fails even if only some cognitively driven attentional effects can be treated as resulting from (cognitive) expectations. But the reintroduction of attentional parameters—supposing such models are accepted argues for a natural divide among phenomena. On such a view, effects not explained attentionally are not attentional effects after all. This might seem a Pyrrhic victory, if the nonattentional expectation effects demonstrate there is cognitive penetration in any event. But this paper is not a defense of cognitive impenetrability tout court, only of the nonpenetrability of cognitively driven attentional effects. It remains a question of course whether there is expectation-based cognitive penetration—recall the other issues mentioned above. But if there is, it is important to distinguish it from cognitively driven attentional effects. We want to know not just whether there is cognitive penetration, but also, if there is, the details of how it does and does not occur.

# What Should Cognitive Penetrability Be?

Finally, Pylyshyn's characterization of cognitive penetration is not the only one. Some others also preclude attentional effects. For example, Macpherson (2012) explicitly rules out effects of spatial attention (though see Macpherson, 2015 for a change of heart):

. . . perceptual experience is cognitively impenetrable if it is not possible for two subjects (or one subject at different times) to have two different experiences on account of a difference in their cognitive systems which makes this difference intelligible when certain facts about the case are held fixed, namely, the nature of the [effect of the] proximal stimulus on the sensory organ, the state of the sensory organ, and the location of attentional focus of the subject. (Macpherson, 2012, p. 29)<sup>12</sup>

But some alternative characterizations do not preclude attentional effects. Stokes (2013, p. 650) suggests that "[a] perceptual experience E is cognitively penetrated if and only if (1) E is causally dependent upon some cognitive state C and (2) the causal link between E and C is internal and mental." The second constraint rules out overt, but not covert attentional shifts. Wu (forthcoming) argues that cognitively driven attentional effects on perception amount to cognitive penetration by explicitly dropping Pylyshyn's semantic coherence constraint in favor of a weaker statistical, or correlational, notion of information penetration.

Which conception of cognitive penetration should we use? Which gets the phenomenon right, or marks an important joint, or is the most fruitful? In particular, is it Pylyshyn's? A tempting reply is that 'cognitive penetration' is a technical term, which Pylyshyn coined; so, how it could he fail to get it right? But someone can put their finger on something without quite articulating what matters most. Indeed, the 1999 formulation on which we have focused is itself a modification of Pylyshyn's earlier statements. (See Stokes, 2013 for discussion and references.)

Stokes (2015) suggests we assess the candidates in terms of their consequences—especially their consequences for the questions that drive our interest in cognitive penetrability in the first place. He underscores two kinds of consequence in particular: for questions concerning cognitive architecture and for questions in epistemology. And he argues that Pylyshyn's characterization, though it has an epistemic dimension in virtue of its requirement of semantic coherence, fails to connect with the epistemological questions. We can see this in relation more specifically to cases involving cognitively driven attention by noting, for example, their importance for issues of bias (cf. Lyons, 2015; Wu, forthcoming; Silins, 2016 surveys epistemological questions connected to cognitive penetrability). Some questions of cognitive architecture likewise seem not to turn on semantic coherence: even if valuing or desiring money affects the perceived size of coins (Bruner and Goodman, 1947—but see Landis et al., 1966), it does not provide a reason for this shift.<sup>13</sup>

<sup>11</sup>Attentional parameters are sometimes construed in terms of a different kind of expectation: an expectation concerning precision in the signal, as opposed to an expectation concerning its distal cause (e.g., Feldman and Friston, 2010 though, as it happens, they suggest the effects of endogenous cues do not involve cognitive states). It is suggested that, within a predictive coding framework, one can thus account for differences in neural response associated with attention and expectation concerning the stimulus. (A mechanism that increases gain is typically hypothesized, but recall Pestilli et al., 2011 and Orhan and Ma, 2015, cited above.) However, just as attention can be directed independently of expectations concerning the distal scene, it can be directed in the absence of expectations concerning stimulus precision. Attention thus dissociates from this kind of expectation as well. (An attention shift may cause higher precision, and in this sense it would be reasonable to expect higher precision to result. But the point is that an attention shift need not result from an independent expectation of precision. Thus, it cannot be construed as reason-giving.) For criticisms, consonant with our views, of predictive coding accounts of attention, see Ransom et al. (2017).

<sup>12</sup>Macpherson (2012, pp. 43–46) is inclined, however, to allow feature-based attention. Incidentally, her formulation does not preclude other indirect effects: her paper defends an indirect mechanism—via mental imagery—for the cognitive penetration of color experience. See Gross et al. (2014) for discussion.

<sup>13</sup>Semantic coherence is also not relevant for various questions concerning the causal effect of non-cognitive, but non-perceptual, states (e.g., states directing motor systems and some emotional states) on perceptual states. Of course these are not then questions of cognitive penetrability, but they are interesting questions nonetheless and are relevant to debates about modularity, on some characterizations.

But an alternative approach, which Stokes mentions but does not develop, would consider the consequences for various debates one at a time, instead of attempting to find a single characterization of cognitive penetrability that fits them all (cf. Siegel, 2015). Pylyshyn's characterization (pace Stokes, 2013, p. 659, fn. 5) has a specific motivation, outlined above: to see whether early vision is "continuous" with cognition in virtue of early visual states standing in the same kind of relation to cognitive states that cognitive states can stand in with regard to one another. This renders of interest questions formulated using his characterization regardless of their bearing on other questions also of interest.<sup>14</sup>

There are many phenomena and questions of interest here. We can be pluralists about our interests. As for the label

### REFERENCES


'cognitive penetrability,' since discussion has proceeded in several directions, we can be pluralist about that as well, so long as we are clear. This does not mean that any characterization of 'cognitive penetrability' is as good as another. Some may have no interest at all. Which do will get sorted out in the light of further investigation, theoretical and empirical. We have argued that Pylyshyn's question is of interest and that his answer, regarding attentional effects, is correct—although not for the reason he emphasizes.

#### AUTHOR CONTRIBUTIONS

The author confirms being the sole contributor of this work and approved it for publication.

#### ACKNOWLEDGMENTS

For helpful correspondence, discussion, and feedback, I thank Jacob Beck, Marisa Carrasco, Chaz Firestone, Jonathan Flombaum, Athanasios Raftopoulos, Brian Scholl, and the referees.


<sup>14</sup> Firestone and Scholl (2015, 2016) provide a different motivation for characterizations that preclude attentional effects: such effects are mainstream in perception science and fairly well-understood, whereas cognitive penetration is supposed to be a surprising, radical claim. Purveyors of the 'all levels' objection might respond that the goalposts have unfairly shifted: that attentional effects are bound up with perception at all levels was not so mainstream when Pylyshyn rejected the claim. We have provided an alternative Pylyshyan reply to the 'all levels' objection. But, in any event, perhaps Firestone and Scholl could add that, if we want to keep our questions interesting, shifting the goalposts is the right thing to do as knowledge progresses.


**Conflict of Interest Statement:** The author declares that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

Copyright © 2017 Gross. This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) or licensor are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.

fpsyg-08-00221 February 21, 2017 Time: 16:45 # 12

# No Evidence of Narrowly Defined Cognitive Penetrability in Unambiguous Vision

#### Nikki A. Lammers1,2 \*, Edward H. de Haan1,2 and Yair Pinto<sup>1</sup>

<sup>1</sup> Department of Brain and Cognition, University of Amsterdam, Amsterdam, Netherlands, <sup>2</sup> Department of Neurology, Academic Medical Centre, Amsterdam, Netherlands

The classical notion of cognitive impenetrability suggests that perceptual processing is an automatic modular system and not under conscious control. Near consensus is now emerging that this classical notion is untenable. However, as recently pointed out by Firestone and Scholl, this consensus is built on quicksand. In most studies claiming perception is cognitively penetrable, it remains unclear which actual process has been affected (perception, memory, imagery, input selection or judgment). In fact, the only available "proofs" for cognitive penetrability are proxies for perception, such as behavioral responses and neural correlates. We suggest that one can interpret cognitive penetrability in two different ways, a broad sense and a narrow sense. In the broad sense, attention and memory are not considered as "just" pre- and post-perceptual systems but as part of the mechanisms by which top-down processes influence the actual percept. Although many studies have proven top-down influences in this broader sense, it is still debatable whether cognitive penetrability remains tenable in a narrow sense. The narrow sense states that cognitive penetrability only occurs when top-down factors are flexible and cause a clear illusion from a first person perspective. So far, there is no strong evidence from a first person perspective that visual illusions can indeed be driven by high-level flexible factors. One cannot be cognitively trained to see and unsee visual illusions. We argue that this lack of convincing proof for cognitive penetrability in the narrow sense can be explained by the fact that most research focuses on foveal vision only. This type of perception may be too unambiguous for transient high-level factors to control perception. Therefore, illusions in more ambiguous perception, such as peripheral vision, can offer a unique insight into the matter. They produce a clear subjective percept based on unclear, degraded visual input: the optimal basis to study narrowly defined cognitive penetrability.

Keywords: high-level factors, bottom-up progressing, visual illusion, peripheral vision, cognitive penetrability, uniformity illusion, modular system, perception

## BELIEVING IS SEEING

Why do we see things the way we do? This fundamental question of how perceptual input is translated into a subjective experience of the world has been discussed for decades. We effortlessly perceive a rich visual world, even though sensory input is often noisy or unreliable. For example, in peripheral vision large numbers of rods provide input to only a single ganglion cell. Therefore, in peripheral vision retinal input is fairly crude and less sensitive to color information than

#### Edited by:

Gary Lupyan, University of Wisconsin-Madison, United States

#### Reviewed by:

Dwight Kravitz, George Washington University, United States Maria Olkkonen, Durham University, United Kingdom

> \*Correspondence: Nikki A. Lammers n.a.lammers@uva.nl

#### Specialty section:

This article was submitted to Cognitive Science, a section of the journal Frontiers in Psychology

Received: 25 September 2016 Accepted: 09 May 2017 Published: 10 July 2017

#### Citation:

Lammers NA, de Haan EH and Pinto Y (2017) No Evidence of Narrowly Defined Cognitive Penetrability in Unambiguous Vision. Front. Psychol. 8:852. doi: 10.3389/fpsyg.2017.00852

the fovea (Westheimer, 1982; Anderson et al., 1991). Yet we perceive the world as rich in color and detail (Lamme, 2006; Block, 2007, 2011; Rahnev et al., 2011). So how can human perceptual experience be so clear, when it is often based on unclear input?

There are currently two major, but conflicting, answers on the question why we see things the way we do. The first answer is the classical bottom-up view. The classical view states that our visual experience is purely based on a sensory/bottom-up signal, translated according to fixed rules (that may involve world knowledge). A highly influential psychologist in this regard is J. J. Gibson (1904–1979). Gibson states that vision is purely based on information from the environment and that it is not affected by cognitive construction or processing. Gibson's view is also known as ecological psychology (Gibson, 1966). This bottom-up processing is often considered as cognitively impenetrable. Cognitive impenetrability can be defined as the inability to consciously and purposefully modulate the processing of a mental operation that is thought to be carried out in an automated unsupervised manner, such as basic sensory perception. This modular system is domain specific and its operation is mandatory (Fodor, 1983). Although some theories about the visual system are based on this concept, the classical view cannot clearly explain how noisy input is often experienced as a rich visual percept and how object recognition is influenced by contextual information [see, e.g., Bar (2004) for a review on object perception]. It seems that theories based on purely bottomup processing (without any influence of top-down processes) do not hold, and have become outdated.

The second answer to the question why we see things the way we do is the alternative top-down view. In contrast to the classical view, the alternative view states that our perception is affected by transient internal states, such as wishes, expectations and beliefs. This latter view, also known as cognitive penetrability (CP), claims that (intentions of) actions can change our perception through flexible priors. Higherlevel cognitive states routinely penetrate our perception, such that what we see is an alloy of bottom-up factors and beliefs, desires and motivations. The brain continually updates its model of the world based on a Bayesian weighing of sensory input (bottom-up) and prior expectations (top-down) (Knill and Pouget, 2004; Clark, 2013; Summerfield and de Lange, 2014; Pinto et al., 2015). Our perception is cognitively modulated in many ways, for instance, in brightness illusions (Adelson, 1993), Ramachandran's scotoma (Ramachandran and Gregory, 1991), or motion induced blindness (Bonneh et al., 2001). Other examples of the modulation of perception are illusions based on cognitive general rules, such as Ames window (Ittelson, 1952) and Hollow faces (Gregory, 1970; Hill and Bruce, 1993). In these illusions unusual objects or shapes give systematic errors, as they are in conflict with fixed rules or general knowledge.

Over the last few years, many studies claim to have proven CP, without the use of these illusions based on fixed rules or general knowledge. For example, studies show that a bottle of water looks closer when we are thirsty (Balcetis and Dunning, 2010), social expectations affect basic perceptual experiences, i.e., faces with African American features look darker (Levin and Banaji, 2006; Zhong and Leonardelli, 2008), and words are easier to detect when they are morally relevant (Gantman and Van Bavel, 2014). Most researchers consider these results as such pervasive evidence of cp, that the classical notion of cognitive impenetrability is often considered to be untenable.

Although many studies claim to have proven cp, Firestone and Scholl pointed out some significant problems in most of these experiments (Firestone and Scholl, 2015). They state that perceptual top-down research "falls prey to a set of pitfalls." Roughly said, there are two major problems. The first problem within this field is that most results reflect topdown processes in early visual selection through attention shifts. Researches have shown that selection of input can be under top-down control, for instance through eye movements or attention shifts. However, it fails to prove that after selection the translation to percept is under top-down control. For example, inattentional blindness (Mack and Rock, 1998; Most et al., 2005; Ward and Scholl, 2015) might be a failure to see or memorize (Wolfe, 1999; Lamme, 2003) what we do not attend to. According to Lamme (2003), attention and conscious perception might be two separated systems, in which attention is needed to store our actual perception in working memory and to be able to report it afterward. According to this theory, it remains unclear whether inattentional blindness is a result of insufficient attention, insufficient perception or insufficient conscious memory.

The second problem is that experimental results are often not direct proof of change in perception per se, but are possibly a reflection of, for instance, our judgment. We can directly see that a bottle of water is closer when we are thirsty or just assume/conclude that it is closer. Another example is the study of Wesp and Gasper (2012). In an earlier experiment they found that less accurate throwing of darts led to estimation of smaller target-size, as if one's performance perceptually resized the target (Wesp et al., 2004). However, when they replicated this experiment in 2012, subjects were told that the darts were defective. This additional instruction eliminated all correlation between performance and reported size of the target. This result indicates that if an experiment shifts perceptual reports, it could be possible that the shift reflects changes in judgment, rather than changes in perception. Other examples of studies that possibly do not reflect change in perception, although claiming to do so, are experiments using neuroimaging and electrophysiology. Although feedback connectivity in descending neural pathways are often interpreted as top-down effects, in which higher brain regions are assumed to modulate lower brain regions through descending neural pathways (Bar et al., 2006; Gilbert and Li, 2013), such imaging studies are per definition correlational. Specific neuronal interactions and feedback connectivity might be a reflection of our visual percept, but could also be a reflection of, for example, recall (Le Bihan et al., 1993) or imagery (Kosslyn, 2005). Thus, activation that is registered via an electrode or MRI scanner might be not always necessary or even not directly related to perception. Even when neuroimaging data do reflect a direct effect of feedback processing on perception, for example in unconscious inferences, this process

is not under conscious control. Using neural data or behavioral data can be very useful in supporting perceptual changes by controlled top-down processes, however, it is not conclusive by itself.

The experimental pitfalls pointed out by Firestone and Scholl make it arguable whether perception is indeed cognitively penetrable or whether most of these studies are methodologically insufficient. The pitfalls listed by Firestone and Scholl mostly rest on the assumption that attention is pre-perceptual and memory is post-perceptual, and that it is often not clear which actual process has been affected. However, it is debatable whether attention and memory should be considered as purely pre- and post-perceptual systems, or as part of the mechanisms by which top-down processes influence the actual percept and thereby as part of the visual system (Lupyan, 2016).

We suggest that one can interpret cp in two different ways, in a broad and narrow sense. The broader sense of cp suggests that attention and memory are part of the visual system, and that top-down processes can influence the perceptual system. In this definition, perception is penetrable when top-down processes change attention, perception or memory. If cp is interpreted in the broad sense, many studies have provided fairly strong evidence of cp. For example, scene knowledge affects perception of edge orientations (Neri, 2014), knowledge of the real-world size of, e.g., a basketball affects apparent speed of motion (by altering perception of distance) (Andrés et al., 2015), knowledge of usual object colors shades our color perception (Hansen et al., 2006; Olkkonen et al., 2008; Witzel et al., 2011; Kimura et al., 2013) and influences the intensity of color afterimages (Lupyan, 2015a,b), and hearing the right word can make something visible that is otherwise invisible (Lupyan and Ward, 2013).

In the narrow sense of cp, however, the notion of cp is less obvious. We define narrow cp as follows. Narrow cp occurs when flexible factors (that can be learned and unlearned) affect perception, after the effects of attention, selection and memory are dismissed (see Vance and Stokes, 2016 for a similar definition). According to this narrow definition of cp, the pivotal question is whether selected malleable top-down factors can still affect perceptual experiences after sensory input (attention) and before reporting (memory). Two requirements need to be met before narrow cp is established. First, perception itself has to be unambiguously influenced by top-down processes. Second, these top-down processes must be flexible, in the sense that a healthy adult is able to turn these processes on and off, through training or voluntary decisions. Thus, fixed topdown processes (such as brightness perception being affected by surrounding information) do not count as examples of narrow cp.

In many of the previously mentioned studies (Hansen et al., 2006; Olkkonen et al., 2008; Witzel et al., 2011; Kimura et al., 2013; Neri, 2014; Martín et al., 2015; Witzel, 2016) purely attentional or post-perceptual processes (Lupyan and Ward, 2013) may have caused the observed effects. For instance, it could be argued that scene knowledge primarily affects orientation judgments, rather than that it causes perceptual distortions. Similarly, perhaps real world knowledge affects speed judgments more than that it creates actual illusions in speed perception. Furthermore, studies of binocular rivalry and continuous flash suppression have shown that attention/selection can determine the dominance of a stimulus (Chong et al., 2005). In other words, selection through attention may cause the effects of top-down processes on binocular rivalry and continuous flash suppression.

We acknowledge that it is very difficult to separate out attentional and perceptual effects. Some attention researchers may therefore not share our notion of narrow CP, since it could be argued that attention cannot be separated from perception. However, it is crucial to stress that in our definition of narrow CP, selection or amplifying effects of attention do not constitute narrow CP. These effects of attention on perception clearly occur and are consistently found in both behavior and neural activity. However, in our definition of narrow CP, flexible topdown factors should affect perception after selection has taken place, and in such a way that the contents of perception are altered (not merely the level of awareness). We assert that although it might be difficult, it is not impossible to prove narrow CP after the effects of attention, selection and memory are dismissed. For example, some illusions, such as the McGurk effect (McGurk and Macdonald, 1976), are clearly distortions of perception (from a first person perspective). In experiments without such a clear subjective distortion, it is hard to prove whether perception, or pre- or post-perceptual processes are affected. We, therefore, assert that in order to prove cp according to the narrow definition, we need to focus on perception from a first person perspective instead of (or in addition to) using proxies for perception. For example, by using clear visual illusions. Only when top-down, malleable, factors cause a clear illusion from a first person perspective, strong claims about the narrow definition of cp can be made.

Importantly, although visual illusions may be considered as proof for cp in the broader sense, awareness and understanding of the illusion cannot make them unseen and therefore most visual illusions cannot (yet) directly provide evidence for narrow cp. These illusions seem to be caused by fixed rules, which are hardwired into the visual system.

In conclusion, we claim that there is currently decisive evidence for CP when defined broadly, but not (yet) for CP in the narrower sense.

# PERIPHERAL ILLUSIONS

Here we take a critical position toward the existence of narrow CP, i.e., the occurrence of flexible, learnable top-down factors affecting the contents of perception (as shown through clear illusions) while dismissing the effects of attention and memory. We want to point out, however, that the current lack of evidence for narrow cp does not necessarily imply that it does not exist. An alternative explanation for the absence of proof might be the fact that in nearly all illusion studies, stimuli are presented foveally, while they are attended. Since the signals from the fovea are often high fidelity, bottom-up input requires less or no direct

top-down influences. In contrast to these clear foveal signals, the resolution in our peripheral vision is roughly equivalent to "looking through a frosted shower door" (Eagleman, 2001). We suggest that with noisy sensory input, like this peripheral frosted shower door, we have a much better chance of finding evidence that noisy bottom-up signals might be influenced by first-person factors, such as personal traits, experiences and believes. Even though sensory signals can also be ambiguous in the foveal part of the retina, as they confuse information from surfaces and illuminants, and because of the 3D to 2D projection, the essential difference between a noisy foveal image and an image in the periphery, is that in one case the external input is noisy, while in the other case the input is clear but the processing is noisy. The difference can be understood as follows. Imagine two reporters; one is a very reliable reporter while the other one is extremely chaotic and unreliable. When the very good reporter (i.e., the fovea) reports to the control room that the situation is disorderly, and there are riots everywhere, the control room will simply conclude that that is the current state of affairs. However, when the chaotic reporter (i.e., the peripheral signal) delivers an incoherent report, the control room will try to use best guesses to really understand what is going on. In other words, when the fovea reports to the brain that the external stimulus is noisy, the brain has no reason to override this report, and thus no flexible illusion will be created (only illusions based on fixed rules). However, when the fovea transmits a low fidelity report, the brain may augment this report, and thus possibly create a visual illusion based on transient cognition.

Peripheral vision becomes especially noisy during long fixations (Clarke, 1961; Martinez-Conde et al., 2006), in which (parts of) perception flexibly adopt a new identity based on global visual information and possibly high-level factors. Perhaps, peripheral illusions based on such long fixations could prove an effect of cognitive contents. They might be more sensitive to learnable priors and less driven by automatic algorithms, as their bottom-up signal is noisy. One striking visual illusion in peripheral vision is the uniformity illusion (see **Figure 1**) 1 . This illusion suggests that the detailed peripheral visual experience is partially based on a reconstruction of reality. In a visual display where central stimuli differ from peripheral stimuli on specific properties, central stimuli appear to overflow into the periphery for extended periods of time. Observers thus perceive the stimuli in the periphery to take on the properties of the central stimuli, resulting in a uniform field encompassing the center and the periphery of the display (Otten et al., 2016). This uniformity illusion has been demonstrated for a wide range of visual features, such as luminance, orientation, motion and texture. Importantly, unlike most other visual illusions, this is an illusion based on weak sensory processing. Although it seems likely that the illusion is (at least partly) driven by fixed rules and automatic algorithms just as other visual illusions are, more research is required in order to answer the question whether learnable priors can affect this illusion. Its ambiguous nature, its global effect on perception and the wide range of visual features in which this illusion occurs, provide the ideal circumstances to study how the brain constructs visual illusions and to what extent such illusions are cognitively penetrable.

To summarize, there are still some disagreements concerning the role of cognitive penetrability in visual perception. We do not debate the existence of CP in the broader sense. However, in our definition of narrow CP, attention and memory are considered to be pre-perceptual and post-perceptual processes. Moreover, cognitive penetrability only occurs when flexible, learnable factors affect the contents of perception, after the effects on attention and memory are dismissed. We argue that narrow cp has not (yet) been proved, since most evidence for cognitive penetration is based on methods that employ proxies for perception. The one data point that could really prove narrow cp is a clear illusion from a first-person perspective. However, so far, clear illusions do not support narrow cp, as these illusions cannot be unseen (i.e., they are driven by unchangeable rules).

To provide more insight into the matter, future research should focus on cognitively induced perceptual illusions when the sensory signal is noisy, such as during the uniformity illusion. Interesting research questions would be; which functional manipulations affect the uniformity illusion? Or, how can prior expectations influence this illusion? For example, when subjects are divided into two categories, in which subjects of category one are given no priors and subjects of category two are first given correct priors about the stimuli in the periphery, followed by false priors. Can these correct/false priors strengthen/weaken the uniformity illusion? And does changing the priors within subjects change the perception of the same stimuli? If future research indeed verifies that illusions can be affected through learnable cognitive priors when sensory input is unreliable, then the notion of cognitive penetrability receives clear proof, even when it is defined narrowly. However, if even under these circumstances narrow cp does not occur, then it becomes doubtful whether narrow CP exists at all. In that case, the

<sup>1</sup>http://www.uniformillusion.com

effects of cognition are probably either purely based on postperceptual processes (e.g., memory, judgment) or pre-perceptual processes (input selection), or driven by fixed, unlearnable factors.

# AUTHOR CONTRIBUTIONS

All authors shared the opinion of a critical position toward cognitive penetrability in visual perception, and the significance

### REFERENCES


Gregory, R. L. (1970). The Intelligent Eye. London: Weidenfeld & Nicolson.

Hansen, T., Olkkonen, M., Walter, S., and Gegenfurtner, K. R. (2006). Memory modulates color appearance. Nat. Neurosci. 9, 1367–1368. doi: 10.1038/ nn1794

of cognitively induced perceptual illusions in peripheral vision. NL drafted the manuscript, which was adjusted based on feedback from YP and EdH. All authors approve the manuscript.

## ACKNOWLEDGMENT

None of the authors have financial or other conflicts of interest to report. This work was supported by ERC grant FAB4V (#339374).


Kosslyn, S. (2005). Mental images and the brain. Cogn. Neuropsychol. 22, 333–347.


Mack, A., and Rock, I. (1998). Inattentional Blindness. Cambridge, MA: MIT Press.



**Conflict of Interest Statement:** The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

Copyright © 2017 Lammers, de Haan and Pinto. This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) or licensor are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.

# Perception and Cognition Are Largely Independent, but Still Affect Each Other in Systematic Ways: Arguments from Evolution and the Consciousness-Attention Dissociation

#### Carlos Montemayor<sup>1</sup> \* and Harry H. Haladjian<sup>2</sup>

<sup>1</sup> Department of Philosophy, San Francisco State University, San Francisco, CA, USA, <sup>2</sup> Laboratoire Psychologie de la Perception, CNRS, Université Paris Descartes, Paris, France

#### Edited by:

Athanassios Raftopoulos, University of Cyprus, Cyprus

#### Reviewed by:

Maria Olkkonen, Durham University, UK Robert Lawrence West, Carleton University, Canada

> \*Correspondence: Carlos Montemayor cmontema@sfsu.edu

#### Specialty section:

This article was submitted to Cognitive Science, a section of the journal Frontiers in Psychology

Received: 09 September 2016 Accepted: 06 January 2017 Published: 24 January 2017

#### Citation:

Montemayor C and Haladjian HH (2017) Perception and Cognition Are Largely Independent, but Still Affect Each Other in Systematic Ways: Arguments from Evolution and the Consciousness-Attention Dissociation. Front. Psychol. 8:40. doi: 10.3389/fpsyg.2017.00040 The main thesis of this paper is that two prevailing theories about cognitive penetration are too extreme, namely, the view that cognitive penetration is pervasive and the view that there is a sharp and fundamental distinction between cognition and perception, which precludes any type of cognitive penetration. These opposite views have clear merits and empirical support. To eliminate this puzzling situation, we present an alternative theoretical approach that incorporates the merits of these views into a broader and more nuanced explanatory framework. A key argument we present in favor of this framework concerns the evolution of intentionality and perceptual capacities. An implication of this argument is that cases of cognitive penetration must have evolved more recently and that this is compatible with the cognitive impenetrability of early perceptual stages of processing information. A theoretical approach that explains why this should be the case is the consciousness and attention dissociation framework. The paper discusses why concepts, particularly issues concerning concept acquisition, play an important role in the interaction between perception and cognition.

Keywords: cognitive penetrability, consciousness, visual attention, evolution, dissociation, language, concept acquisition

# INTRODUCTION: EVOLUTIONARY ARGUMENTS FOR A PERCEPTION AND COGNITION INTERFACE

This paper critically assesses the view that there are systematic and robust influences from cognition on perception at the early stages of processing, which could be considered cases of cognitive penetration. While we agree with the criticisms that there are empirical "pitfalls" in the experiments allegedly reporting cognitive penetration (see Firestone and Scholl, 2016), there also are difficulties regarding the view that there is a sharp distinction between perception (the processing of sensory information that occurs at several levels) and cognition (the judging of representational contents related to reasoning). Besides being problematic theoretically, the assumption that a sharp distinction between all cognition and all perception must be an essential

aspect of the mind may even be empirically false. The criticisms around the notion of penetrability need to be more balanced so that it accounts for an architecture consisting of some cognitively impenetrable modules (characteristic of early perception) along with others that are susceptible to top-down influences (characteristic of late perception). Such varied effects must be available in perception to understand abilities such as predictive coding and conceptual attention.

We focus on concept acquisition to explain the interface between cognitively penetrable perception and cognitively impenetrable perception, and particularly on the fact that concept acquisition is also a perceptual, rather than a strictly cognitive process involving only reasoning or judgment. Even if the brain's architecture is organized in a modular and encapsulated way, there can still be a conceptual interface between perception and cognition. It is at this conceptual interface, which is also responsible for explicit or discursive judgment and inference, where most interactions between perception and cognition will occur that can contain instances of cognitive penetration. We will explore the issue of concept acquisition at different stages of processing and explain how it relates to top-down pre-cueing. This relation will reinforce our point that a balanced combination of any possible cognitive penetrability and early impenetrability is critical. In fact, we aim to show that conceptual interfaces between cognition and perception are crucial for understanding how our species developed sophisticated forms of attention.

One approach to achieve this balanced interface perspective is based on the consciousness and attention dissociation (CAD) framework (Montemayor and Haladjian, 2015). This framework characterizes the relationship between consciousness and attention, and claims that attention is significantly dissociated from consciousness, with different levels of interactions between attention and conscious awareness. This distinction is important because there is abundant evidence of cognitive effects on attention without conscious awareness—an unconscious form of cognitive guidance—as well as cases in which motivational states guide implicitly, sometimes against the conscious judgments of subjects, as in cases of implicit bias (see Montemayor and Haladjian, 2015, for a discussion of the evidence in vision). These cases of guidance and selection in perception may be conceived as attention routines, and many will be independent, and even disagree, with conscious perception. Crucially, for the topic of cognitive penetration, CAD allows for the systematic guidance of late perception by cognitively driven attention, while also allowing for the cognitive impenetrability of early perception.

These different types of guidance and influence on late perception (voluntary and involuntary, conscious and unconscious) help clarify some problems concerning extant discussions on cognitive penetration. Some alleged cases of cognitive penetration may readily be excluded, for instance cases of explicit voluntary judgment on perceptual contents that are not even indirectly influenced by beliefs or discursive inference. Some motivational and emotional forms of guidance are more problematic, as they typically occur independently of explicit propositional attitudes, although they can easily be understood as part of the attentional guidance on late perception. But it seems that if all implicit forms of motivational and cognitive guidance are excluded, as defended by the sharp delineation view, then it is too easy to conclude that perception is never penetrable by cognition. We will resist this conclusion by arguing that it is an implausible view of the complexity of perception—and of its evolution. We will also argue against the opposite view that cognitive penetration is widespread, as some proponents of cognitive penetrability propose. Some forms of perception, specifically early perceptual processing, must be impenetrable. The problem is one of balance: there must be systematic forms of influence on perception without major disturbances to the evolutionarily developed and required perceptual invariances for successful navigation and motor control. The dissociation between consciousness and attention provides this more nuanced theoretical approach, and it advances the debate beyond the strict dichotomy between cognition and perception.

In addition, the CAD framework is particularly well suited to address cognitive penetration because it is supported by a vast amount of findings, specifically in vision science (Montemayor and Haladjian, 2015). The 'early versus late perception' distinction was introduced in order to interpret findings in vision science. Early visual perception includes sensory processes that are specialized for handling specific types of information used in constructing representations independently of beliefs (Pylyshyn, 1999). Late perception involves selective processing by top-down attention and other cognitive processes (Raftopoulos, 2015b). Just like this distinction generalizes to other perceptual modalities and to the more general distinction between cognition and perception, CAD also generalizes to all kinds of dissociations between subjectively conscious experience and attention routines that do not necessitate conscious awareness, including emotions and memory. The central tenet of the CAD framework is that there must be some dissociation between attention and phenomenal consciousness (subjective experience) with some extant theories indicating a high degree of dissociation. Thus, CAD is a framework to better understand, model, and integrate findings and theories on consciousness and attention based on how they are dissociated from each other. In this paper we present the implications of CAD for the topic of cognitive penetration.

The crucial argument we make in support of these claims concerns evolution. Also based on the CAD framework, the argument is as follows.


Therefore, some forms of intentionality are more cognitively penetrable than others, and an interface for penetrability is needed

for concept acquisition and global access (including access to propositional content).

This argument shows why evolution matters to the debate on cognitive penetration, and why penetrability is more complicated than previously thought. CAD can help explain the relationship between cognition and perception, and indicate where cases of penetrability may occur. For instance, one possibility is that there may need to be two interfaces between cognition and perception, one concerning phenomenally conscious experiences and another concerning non-phenomenally conscious perceptual contents. Such interfaces will be critical for all kinds of conceptual and pre-conceptual learning that guide attention routines.

A discussion about what is meant by 'cognitive penetration' is required to fully understand the implications of this argument. By 'cognitive penetration' most authors intend a general category of cognitive influences on how perceptual information is processed by sensory mechanisms, which includes cases in which the beliefs and desires of perceivers somehow determine what they perceive. This of course can be interpreted in many ways. The demarcation between cognitively penetrable and impenetrable perception was originally proposed to understand cognitive architecture, but it now encompasses cases in which top-down attention influences bottom-up early attention routines, independently of specific commitments regarding architecture (Vetter and Newen, 2014). As mentioned, views at one end of the possible degrees of penetrability deny that cognitive penetration captures a truly unique type of influence of cognition on perceptual processing (e.g., Firestone and Scholl, 2016). Such views would never consider systematic influences of cognition on perception as legitimate cases of cognitive penetration. On the other hand, views that state that there is no boundary between cognition and perception deny that cognition could be dissociated from perception (e.g., Clark, 2013).

Thus, a critical issue is how to clearly specify legitimate cases of cognitive penetration—cases in which the influence of cognition on perception is not trivial or easily explained by appeal to inference (Firestone and Scholl, 2016), or some other cognitive process such as judgment or interpretation. This becomes especially important when authors arguing for the case of penetrability do this by giving examples of changes to higher levels in perception, those that are beyond the initial stages of sensory processing. For example, some findings indicate that throughout the stages of perceptual processing there are both forward and backward neural projections that contribute to perception (e.g., Vetter and Newen, 2014). Yet, these do not necessarily indicate that early perception is penetrable by cognition. We argue that the more interesting cases of cognitive penetration would not be at the higher level of perceptual judgment or the interpretation of the output from sensory processing. Nor would they be cases where voluntary attention simply changes the perceptual stimulus or input (e.g., looking to the left based on my desire to change my gaze should not count as a case of cognitive penetration). Radical cases of penetrability would influence perceptual processing directly at early stages, and not simply at a higher attentive (or cognitive) level.

More specifically, the most problematic form of cognitive penetration would have to occur at the level of processing called 'early vision' or early perception more generally (see Pylyshyn, 1999). Instances of radical cognitive penetrability should show that perception, particularly early perception, cannot "resist" the influence of content coming from inferences, beliefs, or desires. This could happen quite selectively: not all beliefs and desires can directly affect perception, but only some specific ones in specific situations. What is crucial is that if radical cognitive penetration exists, then there is the possibility of causal influences from cognition that directly modify perception, even when all else is being equal at the sensory input level, including how attention is being allocated. This causal influence must explain directly how early perception is processed—otherwise, purely conceptual influences could explain cognitive penetration (see Raftopoulos, 2014, pp. 605–606 for discussion). We shall argue against this radical form of cognitive penetration.

Cognitive penetration is a crucial topic in philosophy of perception because of how it relates to controversial issues in epistemology or the theory of knowledge. For instance, there is the view that the contents of perception are propositional (i.e., they have truth conditions, just like the propositions expressed by sentences), and that perception is akin to belief—a kind of propositional attitude (Byrne, 2005). There is also the view that perception need not have propositional content (Crane, 2009). This issue is clearly related to the topic of non-conceptual content in perception. In these debates, it is generally taken for granted that the focus of analysis is perceptual conscious experience. But CAD shows this is an assumption that should not be taken for granted because what is true about phenomenally conscious perception need not be true about perception in general there are types of non-phenomenally conscious perception as in blindsight (e.g., see Kentridge, 2011). More important, CAD explains why these apparently opposite views could be true about different types of perception—one cognitively penetrable at the propositional, later perceptual level and the other cognitively impenetrable at the non-conceptual, early perceptual level. As we argue below, this is actually a consequence of the argument from evolution.

To illustrate the importance of CAD to understand different types of cognitive impenetrability, consider the most basic kind of conscious experience, for instance of color. One possibility CAD allows for is that early color perception is experienced in the exact same way as in other organisms that lack the top-down routines dependent on cognitive capacities. This possibility plays a major role in motivating the notion of phenomenal consciousness, particularly for "first order" theorists, who deny that experiences must be part of a thought or representation for them to be conscious. This approach suggests that many species, certainly mammals, must have phenomenal experiences that are analogous to human phenomenal consciousness. For such overlap in experiences of color, it seems necessary to adopt the view that early vision color is impenetrable (for dissent see Macpherson, 2012). So what about color perception that is processed at the interface with working memory, conceptual categorization, and motivational guidance (e.g., perceptually judging the typical color of an object or evaluating the beauty of a combination of colors)? At this level, it is clear that color perception would be susceptible to different kinds of top-down effects, and these

could count as cognitive penetration at later stages of processing. In humans, these two types of perceptual processing come apart, and only CAD makes sense of this possibility: conscious early (bottom-up) vision without top-down attention modulation and conceptualized color detection, susceptible to cognitive and motivational modulation. An intriguing possibility, entailed by the argument from evolution, is that some animals experience color in a modular, and more encapsulated way, because they lack the conceptual interfaces required for late perceptual modulation and judgment.

The consciousness and attention dissociation thus helps us understand the cognitive impenetrability of early perceptual processes, without maintaining that there is no room for cognitive penetrability at more integrated levels of perception and cognition, in a way that generates an interaction between these levels. It also facilitates the theoretical characterization of cognitive influences on unconscious perception that play no role in conscious experience, and vice versa. Combined with the argument from evolution, CAD justifies the impenetrability of early perception based on the importance of perceptual invariances to navigate the environment, for example, which must have evolved early on, independently of cognitive and motivational influences. It is precisely because different kinds of intentionality evolved at different times that there must be interfaces between perception and cognition, some of which need not be fully fledged conceptual inference. This is why processes involved in concept acquisition are relevant for striking a balance between the 'pervasive cognitive penetration' and 'no cognitive penetration' views.

Like any theoretical category, that of 'early vision' (which can be extended to early perception) has fuzzy boundaries. There is agreement, however, that early vision must include modularly specific (cognitively impenetrable) feature detection, such as color, motion, or orientation, typically before the involvement of working memory. It may also involve objecthood, without the cognitive imprint of conceptual categories. One may say that at the very first stages of perception, there is sentience of phenomenally experienced features, structured spatially and temporally, which can be cross-modally integrated by feature maps. This processing must preserve external invariances concerning light reflectance, shape, distance, and duration (among many other invariances that allow for reliably accurate navigation and coordinated motor control). In this sense, perceptual invariances are preserved by cognitive impenetrability from motivational and conceptual attention modulation (at least in humans). The later involvement of working memory allows for such cognitive and emotional modulation, and what was consciously experienced without the imprint of categorization is now experienced under a conceptual or motivational influence or category. This cognitive transition has implications for how to understand perception in other species and also with respect to the evolution of our own perceptual system. This is one of the reasons why CAD and the argument from evolution must inform our understanding of cognitive penetration.

Based on these considerations, it seems that there are two kinds of cognitive impenetrability: phenomenally conscious (basic feature perception) and non-phenomenally conscious (feature detection outside of awareness). Likewise, there might be two kinds of cognitive penetrability, one phenomenal (motivational influences on perception) and the other nonphenomenal (conceptual influences in blindsight-like detection). Once conceptual capacities are in the picture, however, one can always interpret perceptual contents by providing a propositional explanation or interpretation. Consider the contrast between explaining and directly causing the contents of perception. In typical cases of automatic or effortless inference, you can infer that someone is late by looking at their facial expression or how they are looking at their watch, but this does not mean that you are seeing "lateness." Emotion perception is more complicated, but it might be susceptible to similar interpretative treatments (for dissent, see Siegel, 2006; Newen and Vetter, 2017). We can infer someone's joy through their facial expressions, but we do not necessarily see the actual feeling of joy. In this sense, inference can influence what someone perceives without changing radically how the visual system perceives environmental features, which would remain impenetrable. What causes the contents of perception at early stages remains untouched by top-down modulation.

Such inferential influences could be implicit and not depend on any kind of voluntary guidance. The notion of 'inference' is flexible enough that it could occur at all stages of perceptual predictive processing in perception (see Clark, 2013), where such processing can be influenced by the statistical properties of experiences or contexts (e.g., see Yuille and Kersten, 2006). This more flexible notion seems to problematize the distinction between impenetrable and penetrable perception, but once the CAD framework is in place, one can argue, based partly on the argument from evolution, that early perceptual statistical processing need not be considered susceptible of any topdown influence. Such probabilistic information about perceptual properties is compatible with encapsulation (Raftopoulos, 2015a).

A critical point that deserves emphasis is that cognitive penetration should not jeopardize the stable invariances of perception. This constraint is particularly important for results that aim to show putative forms of penetrability concerning basic information for navigation, such as information concerning distance and depth. If penetration occurs in these cases, it must be shown that they are not pervasive to the degree that someone who is simply walking out of a room would be disoriented by the changes in size, distance, and depth that are based on her beliefs and desires. If cognitive penetration entailed this kind of disruption of basic perceptual invariances, then such cases of penetrability would be just as disorienting, if not more disorienting, than hallucinations. Typically, hallucinations are explained in terms of changes in physiology (e.g., a deliberate neurophysiological change caused by ingesting certain drugs), rather than simple changes in belief and desire. Thus, an important constraint is that cognitive penetration should not be conceived in ways that would entail radical alterations to perception, analogous to those caused by physiology from external sources. Perception (e.g., early vision) must preserve invariances reliably. For truly radical cognitive penetration to occur, there must be evidence that top-down conceptual

information influences the early stages of visual perception beyond simply facilitating the processing of visual information (e.g., attentional effects) (Raftopoulos, 2015b).

As mentioned, another important consideration is the notion of intentionality (i.e., the way in which mental representations are about things and features in the world) and how evolution can explain it. Intentionality may be very basic, processed in a modular fashion, and responsive to immediate information from the environment, or it can be more abstract, categorical, and influenced by judgments and inferences. Various forms of intentionality will correspond to the evolutionary record of such capacities, as well as how widespread they are across species (the earlier, the more widespread). Intentionality will require a conceptual interface at some level, at least in humans, especially when faced with novel stimuli or situations that demand categorization. It is this area of conceptual development that requires scrutiny in terms of potential interfaces for cognitive penetrability of late perceptual stages of processing.

Concept acquisition of perceptual categories, we propose, is the best example of why an interface between perception and cognition is needed. Interesting cases of cognitive penetration could be defined in terms of such interfaces concerning concept acquisition, and this is the strategy we follow here. An important question is whether there are pre-cuing effects on concept acquisition. Since pre-cueing determines how attention is allocated and can change the background neural activity in a way that helps determine what is perceived, it may also determine or bias how a concept is obtained or categorized through perception. The relation between categorical reasoning and categorical capacities based on what ethologists call 'fixed action patterns' is one that deserves attention in this regard. A thorough evaluation of the evolution of intentionality across different species should include an examination of pre-cuing effects on these proto-conceptual intentional representations.

# DEFINING AN INTERFACE FOR COGNITIVE PENETRATION THAT DOES NOT JEOPARDIZE EARLY PERCEPTION

A more essential starting point is to define what is meant by perception and cognition. Perception is the processing of external information by the sensory systems, such as visual or auditory information. It has various stages, and can be broadly categorized between early perception, which is comprised of encapsulated sensory processing modules (e.g., see Pylyshyn, 1999; Raftopoulos, 2015b), and late perception, which includes multi-modal integration, event perception, and object recognition (e.g., see Cavanagh, 2011). Perceptual information processing often leads to the subjective experience of that information, for example, of seeing an object or hearing a sound. Yet sensory processing does not need to enter conscious awareness to be perceptually registered—a lot of it can happen in the background. Importantly, perception is considered to be essentially a "belief-independent" process (particularly the early kind). A key question, notoriously difficult in epistemology, is how can such belief-independent processes justify beliefs? Again, this issue concerns the interface between perception and cognition.

When I see an apple, for example, my visual system is processing information about the features of this object, but how exactly is such processing related to the justification of my belief that I am seeing an apple? If all I perceive is shape and color then the justification of my belief is mostly independent of perception and it must be some kind of inference. But there is no problem in saying that I see an apple (or that I see an object as an apple), and that what I see justifies my beliefs because of the top-down modulation of concepts. This is compatible with the encapsulation of color and shape perception, and CAD is particularly helpful in explaining how this is possible. This helps solve the problem of how epistemically unjustified early processing gives rise to perceptually justified beliefs by the topdown influences of concepts on late perception.

Cognition involves more deliberate modulation by top-down processes, like using focused attention to search for a specific object, and includes action-planning, self-reflection, and abilities related to language. All of these processes are closely linked to consciousness and propositional content (specifically the socalled 'access consciousness'—Block, 1995). These processes are generally epistemic, but they can also include more complex forms of cognition and conscious experience, like aesthetic and moral judgments. The implication of radical cognitive penetration is that such goal-oriented higher-level processes can directly affect the way in which information is initially processed by sensory systems such that it affects feature detection (e.g., the color of the object to be found). We shall argue that they can only alter them indirectly, by the modulation of late perception.

The question at issue is just how much can cognition affect low-level perceptual processes? Will this be a form of precueing that simply directs impenetrable modules and routines, or does it actually affect the processing of perceptual information within the module (beyond attentional effects)? Is any aspect of low-level perception truly cognitively penetrable? Given the constraints mentioned above, as well as the argument from evolution, the answer is that cognitive penetration cannot be pervasive, and if it happens, it has to happen at the right level (e.g., late perception, after the intervention of at least working memory) so that perceptual invariances are not affected and basic abilities necessary for survival, such as navigation, are possible. To reiterate, early perception is not likely to be susceptible of any kind of cognitive penetration. One possibility, compatible with CAD, is that access conscious penetration of perception may occur without phenomenally conscious penetration on early perceptual experiences and vice versa. With respect to phenomenal consciousness, a similar distinction is unproblematic: early phenomenal conscious vision may be nonconceptual and then phenomenal concepts are deployed to categorize experiences (see Loar, 1997).

As mentioned, some authors argue that cognitive penetration never genuinely occurs. Instead, what falls under the category of "penetration" is judgment or cognition, and it never affects perception as such (see Firestone and Scholl, 2016). Other authors defend the view that cognitive penetration affects perception in all sorts of ways, such that belief systematically alters perception

(e.g., Siegel, 2006, 2010; Stokes, 2012). This is argued to occur even at the earliest stages of processing. Given the amount of topdown influence on perceptual processing on a neural level (e.g., see Gilbert and Li, 2013; Vetter and Newen, 2014), this view is not implausible. According to this pervasive-penetrability view, our beliefs, desires, and goals affect perception in multiple ways. What we perceive, therefore, is susceptible to a vast array of cognitive influences.

The pervasive-penetrability view presents a difficult challenge. If cognitive penetration always and systematically occurs, perception would inform us almost always about what we already believe or feel, instead of informing us about features of the world (particularly when we encounter novel objects or events). This is a problem that is especially worrisome for epistemology (Stokes, 2012). Clearly, there would need to be varieties of perceptual penetration with varying degrees of penetrability. If experiences are analogous to beliefs in the sense that they require critical judgment and justification, then one must reflect on, as well as systematically analyze, what one perceives. This reflective analysis would constitute an effortful and highly top-down form of attention (perhaps even effortful voluntary attention to explicitly judged perceptual contents). Problematically, such a belief-based attentive process would need to dominate all other forms of perceptual attention for pervasive penetration to occur.

The consciousness and attention dissociation and the argument from evolution offer a way out of this challenge. It could be that cognitive penetration only affects access consciousness (i.e., access to information available for thought, memory, and action, but without subjective experience) at higher levels of cognitive integration. All effects of cognition on perceptual experiences can be explained by appeal to concepts, beliefs, or inferences, and perceptual contents remain impenetrable at the early stages. It could be, therefore, that top-down attention routines operate independently from phenomenally conscious perception. Motivational effects may be explained at higher levels of integration, which need not modify the contents of early phenomenally conscious perception. The forms of perceptual experience that evolved early, such as experiences of color, would be impenetrable. This theoretical possibility would solve the epistemic problem presented above. CAD could also explain why the pervasive penetration of conceptualized contents in access consciousness need not entail the pervasive penetration of phenomenally conscious perception (subjectively experienced perception).

There are, however, good reasons to believe that the view at the other extreme that rejects any form of penetrability is also too radical. For example, social interactions require perceptual processing and an understanding of the situational context (including other agents) in order to succeed. Categorizing new objects, events, or situations also requires a level of cognitive influence that may depend on previous experience or knowledge. The view of perception as Bayesian inference, for example, presents models of how perception can be constrained by prior experience, biasing detection of more likely features and limiting the possible interpretations of this information (e.g., see Kersten et al., 2004; Yuille and Kersten, 2006). Although we would argue that this sort of biasing is not a form of cognitive penetration of early perceptual processing, it can influence how this processing occurs and particularly influence how the contents of perception are interpreted. Such reasons exemplify why there must be an interface for cognitive penetration. These would be epistemically fundamental cases of cognitive penetration at later stages of perception, where the cognitive integration of emotion, cognition, and perception is at work. Here we try to strike a balance between these opposite views by appealing to the CAD framework and the argument from evolution (see Haladjian and Montemayor, 2015). A more nuanced view is required not only to solve the epistemic problem mentioned above, but also to achieve a comprehensive theory of perception that accounts for the epistemic and motivational significance of perception, and the Bayesian approach is particularly helpful here.

How exactly should the evolution of intentionality be understood, particularly with respect to CAD and cognitive impenetrability? One possibility is that humans and other species share many forms of early perception, with non-conceptual intentional content, which could be understood in terms of Peacocke's (1992) account of "scenario content." As Crane (2009) clarifies, such scenario content must be interpreted in terms of being in a state with non-conceptual content—a representational state such that being in it does not require the possession of concepts—even though such contents could be properly characterized by concepts by a creature with conceptual capacities, such as humans. We cannot be certain about how animals experience such contents, but it is highly likely that they must have similar experiences. Animals navigate, identify objects, react to color, and have similar sensorial systems. At some point in our evolution, our brains created routines to cognitively guide attention, but these routines cannot directly change early perception due to the requirement of feature constancy for survival, which includes features such as color and time (Lisi and Gorea, 2016). Then, even later in our evolution, we learned to explicitly interpret our perceptual experiences and to linguistically articulate such interpretations in terms of discursive inference (a capacity that seems to be exclusively human). Thus, genuine cases of cognitive penetration should not appeal to explicit inference, as when one "sees" that someone is late. But perception at higher levels of cognitive integration (e.g., above early vision) may present interesting cases of cognitive penetration by conceptualization. This would leave early processing encapsulated and impenetrable, and it would also open the door to interfaces between preconceptual perception and cognitively guided, conceptual perception.

Forms of cognitive integration also evolved, and they matter for the way in which perceptual contents are processed. For example, the cross-modal integration of information (e.g., auditory and visual) can indicate influences from one modality on another when attention is directed in a certain way (e.g., Palmer and Ramsey, 2012). Such cross-modal integration is often, though not always, related to conscious experience, with some theories of consciousness relying on the integration of information from multiple sources to produce the unified experience of consciousness (e.g., Tononi, 2012). This multi-level approach could help model possible forms of cognitive integration in

terms of different interfaces that evolved at different times. Early perception remains impenetrable to guarantee stability, but in the course of evolution, contents are accessed and integrated, without affecting early perception. Then memory and motivational systems are also integrated into more complex cognitive states, guided by cognitively driven attention.

Early perceptual processes must, above all, provide reliable information about the environment independently of motivation or cognitive modulation. They include feature-based and object-based attention (Treisman, 1988), and motion tracking mechanisms (Pylyshyn, 1989; Cavanagh et al., 2001). Topdown pre-cuing and cognitive guidance operate at higher levels, after early selection mechanisms of attention have occurred (Yeh and Chen, 1999; Theeuwes, 2010). Thus, early vision provides a basic realm of perceptual experiences that inform navigation, immediate engagement with the environment, and even forms of planning that can be found in other species, such as birds (Clayton and Dickinson, 1998). As mentioned, this form of intentionality may be understood in terms of the notion of 'scenario content'—an intentional state that need not be constituted either by concepts or propositional contents for it to be representational. Navigation in many species seems to demand this kind of intentionality and it must have evolved early (for discussion on how this topic relates to the distinction between analog and digital formats of mental representation, see Montemayor, 2013, chapter 3). It is very likely that in creatures with phenomenal consciousness, scenario content is deeply linked to basic experiences that inform them about the environment much in the same way as they inform us. Although many skillful reactions to the environment occur outside phenomenal consciousness, conscious experience is our most immediate guide for action. Access to content, on the other hand, requires higher levels of integration and the intervention of propositional attitudes, such as beliefs.

There is a related issue concerning how attention works outside conscious awareness in species that may not have phenomenal consciousness. Non-human species with complex attentive systems, such as dragonflies (Wiederman and O'Carroll, 2013), are also not likely to access navigational information propositionally (in terms of access consciousness and conceptual judgment). Here CAD presents an interesting possibility. Perhaps those attention capacities for navigation and object tracking in species like insects are extensionally equivalent to those of organisms that rely on phenomenal consciousness (they overlap in terms of their reference and how the organism reacts to stimuli). But for such extensional overlap to be possible, these early perceptual processes must be impenetrable, or at the very least, the impenetrability of such perceptual processes is the best explanation we have for their overlap across species. Obviously, understanding exactly how much perceptual guidance happens outside conscious awareness is an empirical issue. The claim we defend here is that the distinction between cognitively impenetrable perception and cognitive penetration is fundamental to account for the complexity of perception and its evolution. The challenge is to understand the relation between cognitively impenetrable perception and cognitively penetrable perception. To this end, we now proceed to discuss concept acquisition—one of the clearest instances in which an interface between cognition and perception must occur.

# CONCEPT ACQUISITION

The sharp distinction between cognition and perception, which some critics of cognitive penetration theorize as a central feature of the mind (see Firestone and Scholl, 2016), confronts a particularly pressing problem at the heart of the cognitive sciences: concept acquisition. In fact, the claim that such a strict demarcation is an essential aspect of the nature of the mind may even be empirically false (Kosslyn, 1980, 1994). For our purposes, we will focus only on how the sharp demarcation between cognition and perception generates problems for the issue of concept acquisition. We aim to show that although the pervasive cognitive penetration view cannot be true, as argued above, the opposite view that claims that no cognitive penetration ever occurs is also wrong. An important clarification is that cognitive penetration can occur in late perception (after early perceptual processing), and that preconceptual processes play a major role in providing an interface between cognition and perception at that level. Thus we defend the view that early perception cannot be directly affected by cognition, but that there is an interface that makes late (penetrable) perception possible and, in fact, systematic. The main difficulty is to explain the acquisition of perceptually based concepts that are critical for basic recognition tasks.

Just as we need to be clear about the sense in which cognition determines perception, we also need to be clear about what is meant by 'conceptual cognition.' First, consider the distinctions between memory, recognition, and seeing. Remembering is clearly different from seeing and memory-based attentional effects. Although memory may be crucial to guide perception and categorize novel objects (e.g., Vlach, 2016), it does not determine what we see. But why should conceptually based recognition be on par with memory as a non-perceptual process? Take for instance the evolutionarily crucial skill of recognizing kin and enemies. This fundamental capacity seems to be part of the perceptual system, and it seems to be the result of its evolution (Millikan, 2005). Additionally, recognizing something does not always require a full perception of it, since inferential processing can use key features to inform the representation based on memory, which indicates that recognitional abilities in animals must be a combination of perceptual and preconceptual capacities. Because of how basic these skills are for survival, two forms of recognition could be postulated: one dependent on memory and the other fundamentally perceptual (e.g., the automatic reaction to sensory inputs). This possibility would not be compatible with the sharp demarcation model (e.g., favored by Firestone and Scholl, 2016), since recognitional capacities seem to determine perceptual processing in such cases.

In what sense can preconceptual states that are not cognitively penetrable lead to attention modulation that is cognitively driven? As mentioned, one possibility is that conscious and unconscious non-conceptual states overlap systematically with

contents that can be described categorically by an organism with conceptual capacities. Given the accuracy and reliability of the mechanisms that produce such preconceptual states, one could think of these states as a representational framework that structures an interface for more abstract representations. Language seems to be present only in humans, and as a matter of methodology, it is best not to attribute conceptual capacities to other species (Bermúdez, 2003, calls this a minimalist approach to non-linguistic thought). Taking a minimalist approach is fundamental to explain many navigational capacities that are best understood either as measurement-based representations or scenario contents. It would be inappropriate to characterize these representations in terms of language, concepts, or linguisticpropositional attitude psychology. Actually, some authors think that even in the case of propositional attitude attribution there are reasons to be skeptical about adopting a linguisticpropositional model instead of a more minimalist one (Matthews, 2007).

The proposal mentioned previously, that access consciousness may be responsible for cognitive penetration without causally and directly changing the contents of early perception (including phenomenally conscious perception), can now be spelled out in more detail. Early processing is cognitively impenetrable, intentional, and representational, and it can either be phenomenally conscious (producing experiences of a sensorial kind) or occur unconsciously—in accordance with CAD. These early perceptual states have a content that can be characterized as non-conceptual or non-propositional (for discussion of how to characterize the representational nature of these states see Montemayor, 2013). Then working memory and, in the case of humans at least, conceptual representations, can influence, guide, and indirectly determine the contents of perception at later stages. Working memory processes can also help maintain representations of task-relevant features by activating early feature selection regions of the visual cortex (Serences et al., 2009), which suggests a top-down influence on early vision activations. In fact, various studies testing the memory for sensory signals suggest that the circuitry underlying the working memory involved in these tasks includes cortical areas that do the processing of these signals (for a review, see Pasternak and Greenlee, 2005). Nevertheless, such modulations of early vision are consistent with the CAD approach. Also consistent with CAD is the indirect guidance of late perception, which may depend not only on access-conscious states with propositional content, but also on other motivational and phenomenologically powerful states, such as emotions. This is all consistent with early perception being cognitively impenetrable. But the interface between early and late perception shows that the interaction between perception and cognition is vital for concept acquisition. This clarification is important, because one way of interpreting Firestone and Scholl's (2016) proposal is that such an interface is never possible and that there is no kind of cognitive penetration, even at later perceptual stages.

As a cognitive phenomenon, concept acquisition seems to critically depend on perceptual processes on some level. Fodor (1983, 1998), who is a prominent proponent of the modular and encapsulated architecture view that is putatively incompatible with penetrability, explains concept acquisition as follows: "We have the kinds of minds that often acquire the concept X from experiences whose intentional objects are properties belonging to the X-stereotype" (Fodor, 1998, pp.137–138; his emphasis). These properties are not based on stored memories, otherwise how could one even acquire a concept? What Fodor calls a 'stereotype' is not a judgment, but a statistical notion that captures perceptual regularities (Fodor, 1998, p.138). Fodor insists that perceptual experiences are necessary for concept acquisition. If only judgments were necessary for this, how could one acquire a perceptual concept in the first place? So conceptual recognition seems to be an essentially perceptual process. Even if one holds that concepts are innate, perceptual processes are still necessary to acquire such concepts (obviously, for those who deny innatism, perceptual processes suffice to explain concept acquisition). Concept acquisition is neither explicit judgment nor merely unconscious inference, and favoring a modular and encapsulated architecture (e.g., Pylyshyn, 1999, 2003) can still be compatible with having a conceptual interface between cognition and perception.

Below, we draw a distinction between linguistic labels and conceptual categories, which further clarifies the processes underlying concept acquisition. First, we want to expand on how the distinction between early and late perception relates to traditional issues in epistemology. When you see a red cup, seeing it as a cup that has the property of being red obviously means that you possess the concepts 'red' and 'cup.' But your perceptual system can be in a phenomenal state with the red cup as part of its content, independently of these concepts (as it occurs with infants, and presumably in other species). In other words, your perceptual system can have a visual experience of the red cup without seeing it as an object that falls under the category 'red cup.' For this reason, it seems that theories in cognitive science must allow for the distinction between non-epistemic and epistemic seeing (e.g., seeing a bundle of features versus seeing something as an instance of a conceptual category).

Cases of expertise generate an interface not only with concepts, but also with larger repertoires of judgments and beliefs. Looking out your window, you see a bird land on a nearby tree limb and you notice its gray and black colors. Your expert friend, an ornithologist, sees not only the bird and its colors, but also sees it as a hooded crow. This contrast can be interpreted in several ways: you see an object and its colors, and after attending to it carefully you see that it is a crow; or you see a bird and while you see it as a crow, your expert friend sees it as a hooded crow. In the latter case it seems clear that you and your expert friend see the same bird (but see Siegel, 2010, for the claim that these might be different perceptual experiences with different contents). In the former case you see the bird and apply the concept 'crow.' Other species may see the bird and be in a perceptual state that disposes the animal to behave as if it were referring to crows in particular, but without needing to be in a conceptual or propositional state. You and your friend, however, are accessing information differently even though the content of your early perceptual experiences very likely overlap. This is why access consciousness is associated with more complex forms of cognitive integration that occur at later stages of perceptual

processing. You posses the concept crow and bird, but only your friend can draw the inference that this is a specific kind of crow.

Expertise (and/or prior experience) can change how we see something conceptually, at the access consciousness level, but not perceptually, at the early phenomenally conscious level. It can affect perception at later perceptual stages, as when perceptual contents are integrated with motivational states. Being an expert might help you notice the nuanced details of a bird that enable you to identify it as a certain species, compared to the naïve observer that just sees it as some kind of bird (i.e., attention to the detail might differ, though the same perceptual contents are available to both observers). Expertise could provide a form of pre-cueing effect. For example, by tuning the nervous system to integrated contents, musicians are able to respond to multisensory stimuli more efficiently (Landry and Champoux, 2017). These effects modulate or guide attention, rather than determine what one perceives by affecting how information is processed. Even in cases of sensory phenomena, such as adaptation or negative after-images, changes in perception are due to the unusual and consistent activation of visual neurons (e.g., by forcing a constant fixation, a stimulus in the periphery can disappear due to neural fatigue), and would not be considered cases of cognitive penetration. In fact, these changes in adaptation occur because gaze is directed in such a way as to induce these phenomena, which are examples of how the modules of perception can be directed in ways to exploit their inherent characteristics, and not an example of cognition directly changing the processing within the modules of early perception (see Clifford et al., 2007). These adaptation effects occur at several levels of perception that include late ones, as in the case of face perception (Webster and MacLeod, 2011). It is the modulation based on concepts and propositional content that is distinctly characteristic of access consciousness, which according to CAD, need not characterize phenomenal consciousness, including subjectively experienced adaptation effects, thereby allowing for the cognitive impenetrability of early perceptual states.

Concept acquisition begins with perceptual processes that provide contents that need not be conceptualized to be informative. Then later perceptual stages interface with conceptual information and then store categorical information into memory. Such interfaces are critical for conceptual cognition. Of course, one can combine existing concepts to form new ones independent of direct perception (e.g., a "Pegacorn" can be easily imagined if one is familiar with Pegasus and unicorns). Concept acquisition is a product of various processes, some of which are purely perceptual and others purely cognitive, and many that are a combination of the two. Partly because of this, we believe that neither the absolute impenetrability nor the pervasive penetrability views are entirely correct.

The CAD framework also allows for graded distinctions that explain why, on the one hand, early perceptual processes are so stable regardless of background beliefs and emotions, and on the other hand, why highly integrated information is susceptible to distortions based on beliefs and emotions at later perceptual stages. This is a consequence of the argument from evolution. Since intentionality evolved, an interface between cognitively penetrable perception and cognitively impenetrable perception must have evolved. The perception of magnitudes offers a particularly interesting case. Perceptually represented magnitudes for motor control and navigation (e.g., duration, distance, or rate) differ from conscious attention to the duration of sensations and emotions, including experienced effort. The former are very reliable across species while the latter are susceptible to well-confirmed distortion effects (Kahneman, 2000). Partly because of the difference in integration and susceptibility to distortion effects, there are two models in assessments of experience based on their duration or intensity: the memory-based and moment-experience based models (Kahneman, 2000, p. 692). This contrast between the early perception of magnitudes and more recent interfaces between perceptual magnitudes and conceptualized experiences has clear implications for agency and planning, and it suggests that different species must represent themselves in time differently (Montemayor, 2010).

Perhaps among the evolutionarily oldest forms of early perception is the perception of magnitudes for navigation. Perceptual capacities for navigation are among the most reliable skills that have been verified across species, including insects (Gallistel, 1990). These perceptual capacities rely on representations that are non-conceptual, and can be explained in terms of scenario content (see also Montemayor, 2013, for discussion of why these are representational). Conceptualized emotions (and their duration and intensity), however, are much more difficult to verify in other species and cannot be assumed to be present in many of them (e.g., in insects that can reliably navigate and attend to magnitudes). Presumably, species with theory of mind capacities have a more complex interface for perception, emotion, and cognition, as the distinction between empathic and nociceptive pain shows. The possibility for cognitive penetration correlates with evolutionary history, as the argument from evolution entails, and also with the cognitive integration required for accessing propositional contents. The contrast between the perceived duration of emotions and the more basic perception of magnitudes (e.g., time, distance, and rate) can easily be accommodated by the CAD framework: there is an interface for the integration between emotions and judgments concerning intensity and value at much later stages of perception, but early perceptual processing of magnitude perception is cognitively impenetrable. This guarantees reliability, as mentioned before. In humans, there is also a conceptual interface for the integration of perceptual magnitudes and non-perceptual concepts, such as mathematical concepts concerning space, time, and rate. This interface is associated with access consciousness while the interface with emotions is a combination of late conceptual perception and the phenomenology of emotions. While some studies show that magnitude judgments can be calibrated systematically (e.g., Izard and Dehaene, 2008), these would be cases of modulating the interpretation of the output rather than a cognitive penetration of the magnitude estimation mechanism itself.

An interesting consequence of the argument from evolution in the context of CAD is that competing views about concepts may be correctly describing different levels of perceptual processing. Conceptual structure of the kind humans have is more abstract

than any set of features or simple perceptual attention routines it has a logical structure that allows for negation, valid inference, and compositionality. Such concepts cannot be reduced to the sums of the expected probabilities of features given a perceptual scene, but the earliest, cognitively impenetrable stages may be reducible to such feature or prototype-based analysis. This leads to two further implications of the argument from evolution concerning concepts in particular. First, the higher the degree of cognitive integration and penetration, the more logical structure is needed for cognitive influence. Second, the higher the degree of inferential integration, the more abstract and amodal the concepts are. This higher-level of cognitive integration is the one typically associated with explicit judgment (i.e., explicit judgment has logical structure). This opens the possibility for different types of featured-based prototypes operating at early stages, and more characteristically abstract conceptual representations playing different roles at different interfaces, allowing for different forms of integration and de-modularization at later perceptual stages. These interfaces would be consistent with empirical findings, such as the cross-species findings on the perception of magnitudes and the findings on the distortion of duration judgments regarding emotions in humans. Finally, one finds a similar distinction between prototype-based categorization and more abstract concepts in human development (e.g., Keil, 1989). Developmental studies indicate that infants can obtain perceptual concepts before complex forms of abstract concepts (Spelke, 1988; Spelke and Kinzler, 2007; Carey, 2009). It is with this more advanced type of conceptual interface where we could find cognitive penetrability, at later stages of perceptual processing that are integrated with cognitively driven attention modulation. These interfaces are, in the very least, evidence for the interrelation between perception and cognition at later stages. Thus, postulating different types of interfaces, based on the CAD framework and the arguments from evolution, may help explain cases of cognitive penetration at later stages while preserving the cognitive impenetrability of early perception, striking a balance between the prevailing opposite views.

# CAD AS A FRAMEWORK OF DISTINCTIONS FOR EMOTION, PERCEPTION, AND JUDGMENT

Emotions complicate the picture considerably. They are an important aspect of social cognition and interactions, particularly in terms of developing empathy and helping to understand others. For such reasons, emotional processing must be an integral component of human perception and cognition. Newen (2016), for instance, argues that emotions can be perceived similar to the way perceptual features are perceived. Studies suggest that emotions can be recognized in the same way as pattern recognition in other sense modalities, driven by evolutionary necessity and requiring an interaction of bottomup and top-down processes (see Newen, 2016). Similarly, socially relevant information seems to be processed automatically, thus calling into question whether perception should include attention to social cues (Neufeld et al., 2016). If it is true that emotions and socially relevant information are processed like perceptual features, this view would strongly favor a very robust kind of cognitive penetration because we not only see the basic perceptual constancies that ground object- and feature-based attention, but also emotional and socially relevant content. In other words, if this view is correct, then emotional and social beliefs would determine a substantial portion of perception. It is important to notice that even if this were the case, it would still be compatible with early perception being cognitively impenetrable.

The main problem, however, is that this example of penetrability could simply mean guidance. There is good reason to believe that the neural systems that support emotion overlap with cognition (Pessoa, 2008), and emotional states may be considered a form of pre-cueing. For example, an emotional state, like fear, can bias how one directs attention (e.g., to more threatening aspects of environment) and thus improve interacting with the environment (LeDoux, 2012). This ability also includes non-conscious perception of emotional stimuli (see Tamietto and de Gelder, 2010). If these pre-cuing effects are very robust and systematic, there is a very clear sense in which they determine what one perceives, thus favoring some level of penetrability at later stages of processing.

Just how powerful, exactly, can cognitive penetration be in the case of emotions without being cognitively pernicious (e.g., by altering too much the contents of perception and rendering crucial perceptual invariances unstable and unreliable)? CAD also helps elucidate this issue. Emotions have an enormous impact on conscious awareness, but this impact need not be either fully perceptual or inferential. We believe this is a significant source of confusion. Emotions have a significant impact on an individual's overall phenomenology, but having too much impact on awareness can distract from or may even suppress what one perceives. In such cases, the phenomenon is one of interference or hindrance of perception rather than a determination of perception (e.g., as with post-traumatic stress disorder). In other cases it may enrich perception not by determining it, but by adding vivacity to the overall phenomenological experience. Aesthetic experiences and the vivacity of certain autobiographical memories are good examples of this phenomenon (Montemayor and Haladjian, 2015, pp.150– 165). All these cases are best understood as late perceptual cognitive penetration (perhaps motivational penetration is a better term), rather than cognitive penetration of early perception (for instance, early vision).

Color perception further elucidates the importance of CAD to rigorously define cases of cognitive penetration at later stages of perceptual processing from cognitively impenetrable early vision. Color perception involves two distinct neural circuits, one for color detection and another one related to circadian regulation and emotion (Pauers et al., 2012). Do we perceive emotions when we perceive color? This does not seem plausible. Rather, we detect and experience color in early vision, and we also experience a complex state of perceptual and emotional contents at later stages of processing. Even in the case of an individual's memory of an object's expected color, which can influence the perceived color appearance of an object (see Hansen et al., 2006), such findings do not conclusively indicate

cognitive penetration of early visual perception, but rather the stage that includes the interpretation of the signals from early vision. To complicate things further, some aspects of featuredetection may occur outside consciousness—they are mostly independent in their neural correlates (Koch and Tsuchiya, 2012). Priming of color can occur at higher levels of processing even without conscious perception of the color, as in studies that use backward masking to test priming of responses to colors that are not consciously seen (Norman et al., 2014). To accommodate this fact we need a graded framework like CAD rather than a sharp distinction between cognition and perception or a pervasive form of cognitive penetration. Color detection and color-based emotions do interact systematically at the later stages of perceptual processing that are also phenomenally conscious, but this does not entail that emotion penetrates color detection or early visual color experiences.

Regarding the cases of automatic social detection (e.g., Neufeld et al., 2016), these could be similar to detection patterns associated with social planning routines that operate independently of experienced emotions and feelings. Thus, based on CAD, it is not so easy to say that emotion is detected as part of perception, because such routines could be modeled either as unconscious processing or as specific attention routines triggered by specific perceptual conceptualized contents, rather than being constitutive of early perception, since this detection is not altered by overall phenomenology (a point entirely analogous to the distinction between magnitudes and emotion intensity mentioned above). Generally, it may be that such pattern detection of social cues would not entail systematic penetrability because they may actually occur at late levels of cognitive processing or not be perceptual at all (e.g., they could be inferential or strictly mnemonic).

The CAD framework can explain many changes in perception at later stages while justifying the impenetrability of early perception. Merely appealing to phenomenology and how switching from one attention task to another vary what one experiences does not suffice to prove penetrability precisely because of the distinctions based on the levels of CAD. Moreover, even the phenomenology of perception favors stability and continuity in experience, rather than variability caused by constant cognitive penetration. For example, as one moves around a room, the experienced color and shape constancies of the walls and furniture remain the same despite the many inferential triggers, actual and potential, that one has at any single moment. Strikingly, this also seems to be the case in dreams where there is a generally coherent experience, no matter how absurd it may be. Therefore, appeals to conscious experience may not provide decisive evidence for cognitive penetration because overall phenomenology depends on cognitive integration at late stages of perceptual processing in a way that is compatible with early perceptual impenetrability. What one needs to show in order to verify radical and pervasive cognitive penetration is that cognition determines perception at an essential level, at the earliest stages, causing changes in perception in a direct way. CAD shows that the evidence can be understood in a way that avoids this interpretation because CAD demonstrates that cognition and perception can be independent and yet interact in systematic ways. In particular, concept acquisition of basic perceptual categories is a good place to identify clear cases of cognitive penetration beyond the initial stages of early perception.

# COGNITIVELY DRIVEN ATTENTION: FEATURE-BASED, SYNTACTIC, AND SEMANTIC

It is important to restate why resisting pervasive cognitive penetration is not only plausible because of the argument from evolution, but also as a general theoretical commitment. One reason is the problem of the impossibility of common ground among perceivers. If there is no common ground, how can one explain reliable coordination among multiple subjects for motor control and attentional tasks (e.g., that are executed when playing team sports)? One solution, offered by CAD, is that while there are significant levels of cognitive penetration at highly integrated levels of cognition and perception, there is no cognitive penetration at early conscious and unconscious perception. But it is also important to explain how exactly toplevel processes influence perceptual experience. This is what CAD allows for: cognitive impenetrability of early processing with rich influence from cognition at higher levels of cognitive integration (e.g., attention to the intensity of emotions, or the importance of an autobiographical memory), which correspond to more evolutionary recent types of attention (for a criticism against the view that top-down pathways entail cognitive penetration, see Raftopoulos, 2001a,b).

There are several possible areas of higher-level cognition that could be susceptible to cognitive penetration. According to CAD, phenomenal consciousness varies systematically with emotional and background knowledge contents—it is empathically structured (Montemayor and Haladjian, 2015). How susceptible the more semantic aspects of the mind are to inference and emotional influence may depend on the concepts a species has and the degree of information integration hence the importance of concept acquisition. To repeat, early perception is cognitively impenetrable, which allows for reliable and predictable motor control and coordination with external objects. These contents are processed independently of the empathic and integrative influences of cognition and emotion. This structural requirement is related to adaptive necessity, and likely appeared in other species that are evolutionarily close to humans (Zentall, 2005). Furthermore, human-like conscious awareness seems dependent on a global functional connectivity among brain modules (Wu, 2014; Godwin et al., 2015), and this may indicate a form of penetrability at later, more integrated stages of perception.

At the early stages, perceptual features are processed independently, with minimal top-down modulation, in order to reliably and accurately structure the perceptual scenes (e.g., auditory or visual scenes). This representational scaffolding supports later cognitive guidance and can be characterized as scenario content or preconceptual sensorial representation.

Then, at later stages that likely depend on the intervention of working memory, feature based-attention can be guided and oriented by cognitively driven forms of attention that highlight some perceptual features and suppress or inhibit others based on cognitive and motivational information. Some of these cognitively driven types of attention likely evolved at different times. Some of them modulate detection; others exclusively concern conceptual information and can only be found without controversy in humans. The range of influence of cognition on perception is quite vast and it increases with the degree of cognitive integration, characteristic of late perceptual processing. According to CAD, there is a type of attention at late stages of integration that is fully independent of specific perceptual experiences and that is exclusively driven toward access to propositional contents. We have argued that this kind of cognitively driven attention plays an important role in specifying the contents of late perception, but that it cannot directly change early perception, including the perceptual experiences associated with early stages of perception.

The consciousness and attention dissociation also helps address the previously mentioned difficulty that perception may occur outside consciousness, independently of whether or not the contents of perception are susceptible to cognitive penetration. Consider the result by Vishton et al. (2007) concerning the Ebbinghaus illusion, in which the instruction to grasp the stimulus reduces the illusion. This effect of a reduction in the illusion was found in previous studies where acting on a stimulus producing an illusion indicated a more accurate internal representation than what was consciously perceived. That is, while the phenomenology of perception is tricked by an illusion (e.g., the Müller-Lyer illusion), perception for action is not (Stöttinger and Perner, 2006). Also, this kind of performance can be affected by emotional states (van Ulzen et al., 2008), which indicates that emotion can influence conscious experience, but only at a higher level of integration and processing, as the argument from evolution entails. Similarly, desiring something might affect how it is consciously perceived; for example, an appealing location might seem closer than an unappealing one that is at the same physical distance from the observer (Alter and Balcetis, 2011). Such studies are examples of how conscious perceptions may be affected by certain mental states, and that there can be a dissociation between the information that enters awareness and the unconscious information used for other perceptual processes. It is unlikely, however, that these effects could influence the perceptual-navigational system (e.g., the system we use to walk across a room), or the experiences produced by early perception.

Another example of how feature detection in conscious perception can differ from that used to execute motor actions is seen in an experiment investigating the double-drift illusion. This illusion occurs when an object moves in the periphery of the visual field along a specific trajectory, but because the object has a texture that moves orthogonal to this trajectory, the overall perceived movement of the object does not correspond to the veridical path. In other words, an illusory path is perceived because of the combination of motion information from the internal motion of the object as well as its actual trajectory. In a recent study by Lisi and Cavanagh (2015), participants were asked to make an eye movement to one of these moving objects (that disappeared as soon as the eye movement began), and they found that the eyes landed closer to the veridical path as opposed to the perceived illusory path. This suggests that the information sent to the motor system is not susceptible to the illusion, since the motor system can execute correct eye movements, even though the illusion is consciously perceived. An implication of these results is that unconscious perception can be highly accurate as well as integrated with cognitive-driven goals.

The CAD framework also allows for a more useful distinction that can potentially clarify ambiguities. Consider Kravitz and Behrmann's (2011) finding concerning facilitation by a concept: faster response times to detect 'h' based on prior exposure to 'H.' This kind of cognitively driven attention to syntactic features should not be considered cognitive penetration. For similar reasons, semantic priming should also be considered an attentional effect that is cognitively driven and that occurs at later stages of processing. In the evolution of the visual and other perceptual systems, it is likely that feature-based attention and basic forms of object-based attention evolved first, and only later can one find complex forms of semantically driven attention to features relevant to expertise and propositional contents (see Haladjian and Montemayor, 2015). Attention based on propositional content is, therefore, a kind of cognitive guidance that must occur at later stages of perceptual processing and which must have evolved more recently. This kind of cognitive guidance at later stages can influence inference, memory, object recognition, and concept categorization. In the case of human cognition and perception, this kind of cognitively driven attention to semantic contents is the most important component that facilitates a powerful interface between cognition and perception, and it provides the basic scaffolding for concept acquisition of all kinds. As mentioned before, concept acquisition allows for many kinds of cognitive penetration at later stages of processing, and it is fundamental to understand human perception.

There is yet another, and perhaps even more recent, kind of cognitively driven attention that modulates late perceptual contents: attention to syntactically structured perceptual patterns. The complex hierarchical structure of human language must be somehow perceived. The question is exactly how. If Berwick and Chomsky (2016) are right, the capacity to detect syntactic patterns evolved quite recently in our species. In fact, if it is true that the capacity to articulate and combine strings of symbols hierarchically is as recent as 200,000 to 150,000 years ago (Berwick and Chomsky, 2016, p. 54, indicate that it is only 60,000 years ago that it certainly emerged), then it must be one of the most recent events in our cognitive evolution. While syntax processing has a very significant impact on human cognition, it need not operate by constantly influencing what we perceive (unlike conceptually based late perception, which is essential for epistemic seeing and epistemic perception more generally). Rather, it may operate

in the way motor control operates: in a highly automatic and reliable fashion that cannot be made explicit through discursive judgment and which processes information beyond conscious access. If so, even in spite of its very recent evolution, syntax processing may not provide a robust interface for cognition and perception, and interesting cases of cognitive penetration at late perceptual stages may be limited to semantic processing. This is an issue that needs to be studied in more detail.

This brings us to the last point we want to make. The fact that perception is stable and invariant at the early stages supports not only our cognitive systems but also motor control and action. Early perception plays the critical role of making this possible, by not allowing direct casual influences from cognition or emotion on the processing of the most basic stages of perceptual scene structuring. Basic perceptual experiences are also stable in this way and, moreover, they are experienced in a way that does not necessitate conceptual or propositional guidance. Conscious access to contents, on the other hand, likely requires a high level of integration of information within the brain, which is the argument made by global workspace theories of consciousness (e.g., Dehaene and Naccache, 2001; Baars, 2005), with increased functional connectivity among different neural modules (rather than within modules) being associated with such conscious awareness (Godwin et al., 2015). Cognitive penetration is likely to be found at these later stages of perceptual processing, and crucially, at the interface between early contents and different forms of concept formation. A further question is the extent to which empathic and motivational effects guide late perception. With the rich conceptual framework of human cognition, the interface between emotion, cognition, and perception allows for many kinds of cognitive penetration at these later stages of processing. Semantic and syntactic guidance through cognitively driven attention is a critical part of this process.

Acquiring concepts does not directly affect how features are detected at the earliest level, but they do determine what we epistemically perceive (e.g., as a member of a category). Having a specific concept is not as urgent as responding to a feature essential for survival, but basic categorization, even if it is of a preconceptual kind, can help in urgent situations, as the alarm calls of some animals show. It can also lead to complex forms of planning, mental travel, and even theory of mind, as the quasi-conceptual capacities of birds demonstrate. Fully fledged concepts, as found in humans, lead to a cognitive framework that allows not only for epistemic seeing, but also for inferential judgment (including discursive inference), and epistemic justification. Based on CAD and the argument from evolution, it is useful to think of these capacities as falling under different levels of cognitive integration at higherlevels of perceptual information processing. To reiterate, this is all compatible with the cognitive impenetrability of early perception.

Thus, CAD helps clarify how the fact that perception is deeply related to cognition and emotion is compatible with the cognitive impenetrability of early perceptual processing. Conceptual interfaces are at the center of the relation between cognition and emotion. These conceptual interfaces manifest in forms of perceptual pre-cuing, biases, modulation, and guidance through the mechanism of cognitively driven attention. These interfaces also provide the framework for the type of consciousness associated with access to propositional contents, which according to CAD, is dissociated from the experiences produced by early phenomenally conscious perception. Early perception guarantees stability and reliability, as well as a perceptual common ground with other organisms. Late perceptual processing provides a rich framework of possibilities that enrich perception in many ways. Finally, semantic and syntactic influences in late perception increase these possibilities in ways that cannot be found in any other species, and makes human perception the rich manifold of contents that make possible the very complex behavior that characterizes humanity.

# CONCLUSION

How the world appears to us can depend largely on our expectations, beliefs, and desires. The debate on cognitive penetration has explored this issue in the last few decades from different perspectives, particularly those concerning cognitive architecture and semantic content. The conclusion many authors reach is that cognitive penetration is either largely pervasive or inexistent. We argue that a more nuanced perspective is required. The CAD framework allows for such a perspective, informed by findings from the research on consciousness and attention, and their evolution. More specifically, CAD helps explain why although there may be many cases of cognitive penetration in late perception, early perception must be cognitively impenetrable.

With the CAD framework, a more balanced approach to cognitive penetration is feasible. An interesting question is: could a similar balance be achieved without it? We cannot explore this issue in detail here, but we believe that at the very least, CAD is the best way to achieve this balance. It may be the only way to achieve such a balance in a rigorous way, but we will not argue for this stronger claim here. However, we leave this consideration in favor of CAD: the evidence, including evolution, does not support as strongly an interface without CAD. For instance, such an interface could concern only unconscious processing (e.g., constituted by Helmholtzian inferential abilities). Alternatively, this interface could involve exclusively conscious information, requiring subjectively experienced integration for any perceptual process. The evidence indicates that neither of these options is likely true. Thus, the interface between cognition and perception seems to be fundamentally structured in terms of CAD.

Given the implications of CAD and the argument from evolution, we argued that concept acquisition is a particularly important topic with respect to cognitive penetration, with ramifications for the integration of emotions, inferential reasoning, and recognitional processes. Perception and cognition may be largely independent, and they are fully independent at early stages, but there are systematic ways in which they interact. The more cognitive integration there is, the more cognitive

penetration one finds. Perhaps, as suggested above, there may even be more than one interface for cognitive penetration because there are many kinds of cognitive modulation in late perception. Yet despite this systematic interaction between cognition and perception at such late stages, cognitive penetration is not pervasive.

Besides providing positive suggestions for addressing the problem of penetrability in a more thorough theoretical way, this paper also raises challenging questions. What kind of conceptual or epistemic capacities underlie different forms of penetrability? Which capacities necessitate cognitive penetration? How can one verify such capacities across different species? How is it possible to integrate the findings on consciousness and attention, as well as their dissociation, in a way that addresses the problem of cognitive penetration? The findings on animal cognition and future research on how our own capacities compare to those of other species, particularly in the development of semantic and conceptual guidance, is fertile ground for exploration. The argument from evolution, especially as it concerns the development of different forms of intentionality, should help guide future investigations in this area.

# REFERENCES


## AUTHOR CONTRIBUTIONS

All authors listed, have made substantial, direct and intellectual contribution to the work, and approved it for publication.

## FUNDING

HH received postdoctoral research funding from the European Research Council under the European Union's Seventh Framework Programme (FP7/2007-2013)/ERC grant agreement No. AG324070 awarded to Patrick Cavanagh.

# ACKNOWLEDGMENTS

We are especially indebted to Anasthasios Raftopoulos for extensive, detailed, and enormously helpful feedback. This paper improved substantially because of him. We would also like to thank Albert Newen for valuable discussion, and two reviewers for their insightful suggestions and recommendations.


Kosslyn, S. M. (1980). Image and Mind. Cambridge, MA: Harvard University Press.



Magnani, P. Li, and W. Park (London: Springer International Publishing), 3–20.


**Conflict of Interest Statement:** The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

Copyright © 2017 Montemayor and Haladjian. This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) or licensor are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.

# Pre-Cueing Effects: Attention or Mental Imagery?

Peter Fazekas 1, 2 \* and Bence Nanay 3, 4

*<sup>1</sup> Philosophy and Cognitive Neuroscience Research Unit, Aarhus University, Aarhus, Denmark, <sup>2</sup> Centre for Philosophical Psychology, University of Antwerp, Antwerp, Belgium, <sup>3</sup> Department of Philosophy, University of Antwerp, Antwerp, Belgium, <sup>4</sup> Peterhouse, University of Cambridge, Cambridge, England*

Keywords: attentional modulation, baseline activity, filtering, mental imagery, cognitive penetration

# ATTENTION AND COGNITIVE PENETRABILITY

Much has been written recently about cognitive penetration. If there are perceptual computations that are directly influenced by the information content of certain cognitive states such that the changes in the output of these computations can be accounted for in terms of the content of the penetrating cognitive states, we can talk about the cognitive penetration of perceptual processing.<sup>1</sup>

When considering the possible mechanisms that could mediate cognitive penetration, attention, traditionally, is quickly sidelined as a phenomenon that is trivially unable to exert the right kind of effect on perception. Even if the allocation of goal-directed (top-down, endogenous) attention is driven by the content of certain cognitive states (i.e., goal representations), it does not have a direct influence on perceptual processing itself. For, according to the traditional characterization, attention acts as a filter, a gatekeeper (Broadbent, 1958), or a spotlight (Posner, 1980) that selects and enhances certain signals (corresponding to attended stimuli) while attenuating or filtering out competing signals "prior to the operation of early vision" (Pylyshyn, 1999: p. 344).

#### Edited by:

*Athanassios Raftopoulos, University of Cyprus, Cyprus*

Reviewed by:

*John Zeimbekis, University of Patras, Greece*

\*Correspondence:

*Peter Fazekas peter.fazekas@cas.au.dk*

#### Specialty section:

*This article was submitted to Cognitive Science, a section of the journal Frontiers in Psychology*

Received: *23 September 2016* Accepted: *06 February 2017* Published: *06 March 2017*

#### Citation:

*Fazekas P and Nanay B (2017) Pre-Cueing Effects: Attention or Mental Imagery? Front. Psychol. 8:222. doi: 10.3389/fpsyg.2017.00222*

This traditional understanding has recently been questioned by empirical findings demonstrating that attention is not a passive gatekeeper mechanism acting before the start of perceptual processing, but rather an active modulator of perceptual computations that is able to exert many different effects at many different levels of the perceptual hierarchy (see e.g., Reynolds and Chelazzi, 2004; Nanay, 2010b; Noudoost et al., 2010; Carrasco, 2011, 2014; Lupyan, 2015; Wu, in press).

However, despite this transition from seeing attention as passive gatekeeper to seeing it as an active modulator, opponents still argue against attention-mediated cognitive penetration on the basis of the filter-like nature of attention. As Firestone and Scholl have recently put it, attending is "importantly analogous to seeing through a tinted lens —merely increasing sensitivity to certain features rather than others" (Firestone and Scholl, 2017: p. 23, but see also Lupyan, in press).

# PRE-CUEING AND ATTENTIONAL MODULATION

Thinking about attention as a filter, even in the light of recent experimental data and conceptual shift is supported by some of the empirical findings.

<sup>1</sup> See, for example, the much-cited passage from Zenon Phylyshyn: "if a system is cognitively penetrable then the function it computes is sensitive, in a semantically coherent way, to the organism's goals and beliefs, that is, it can be altered in a way that bears some logical relation to what the person knows" (Pylyshyn, 1999: p. 343). Pylyshyn was interested in the cognitive penetrability of early visual processing, whereas in contemporary discussion the emphasis has been shifted to perceptual processing underlying conscious experiences (Macpherson, 2012, see also Teufel and Nanay, 2017 on the distinction). We will concentrate on the former question here. Recently, Raftopoulos (2009, 2014) has offered a definition of early vision in terms of perceptual processing occurring within 120 ms after stimulus presentation (see also Raftopoulos and Zeimbekis, 2015). Our focus on pre-cueing effects ensures that our claims are applicable even to this characterization of early visual processing.

At the behavioral level, attention increases processing efficiency: The allocation of attention enhances detection rates, speeds reaction times, increases accuracy (Posner, 1980; Posner et al., 1980; Castiello and Umiltà, 1990; Carrasco, 2011). Neural level studies suggest that attention achieves all these by enhancing the neural signals encoding the stimulus-features in question, i.e., by modulating the behavior of sensory neurons in various ways, including amplifying neural responses (Carrasco, 2011), sharpening response functions (Martinez-Trujillo and Treue, 2004; Maunsell and Treue, 2006), and remapping receptive fields (Anton-Erxleben and Carrasco, 2013). Most importantly from our present perspective, attention amplifies neural responses via multiplicative effects like evoking response gain or contrast gain, and also via additive effects like increasing baseline activity (Buracas and Boynton, 2007; Carrasco, 2011; Cutrone et al., 2014).

Pre-stimulus cues increase related baseline activity well before the occurrence of the stimulus (Chawla et al., 1999; Reynolds et al., 2000). This enhanced baseline or spontaneous activity correlates with increased behavioral performance such that subjects with large modulation of baseline activity perform better once the stimulus is presented (Giesbrecht et al., 2006). That is, with a pre-stimulus boost of the spontaneous activity of neurons tuned toward a target the sensitivity of these neurons is increased, and therefore stimulus processing is enhanced.

One way of describing these findings, and one that is standard in the literature, is that this is an attentional effect. When attention is turned toward a specific spatial region or a particular feature value, the activity of cortical neurons selectively responding to the specific spatial region or particular feature value increases. Pre-cueing studies show that this can even be true without the presence of any stimuli in the specific region or with the particular feature. In those cases, top-down attentional modulation increases the activity of those neurons which are sensitive to the spatial position or feature value indicated by the endogenous pre-cue. Since this process is driven by cognitive contents, this provides a nice demonstration of the cognitive penetration of perception by attention.

However, if we construe these studies this way, then the concept of attention at play here will be attention that does act very much like a filter—not as a mere gatekeeper simply letting through some stimuli while blocking others, but as a more advanced filter that is able to modulate certain features of the light passing through it. Also note that attention exerts this effect before stimulus presentation, i.e., well before the start of stimulus processing. That is, in these cases it seems that the opponent of attention mediated cognitive penetration could run a very simple objection: Attention does not seem to affect perceptual processing itself, not at least in a direct way; it only increases the sensitivity of processing units, readying them for the stimuli to come. In short, everything the pre-cueing studies show us about attention would be consistent with a Pylyshyn-esque picture of cognitive impenetrability: There are top-down attentional effects at the entry-level of perceptual processing, but not afterwards.

# PRE-CUEING AND MENTAL IMAGERY

As we have seen, the claim that pre-cueing studies show that perception is cognitively penetrated by means of attentional mechanisms is problematic. Nevertheless, we do want to argue that pre-cueing studies show that perception is cognitively penetrated—not via the mediation of attention, but via mental imagery. In what follows we will argue that cue-induced mental imagery provides a channel through which cognitive states can exert such effects on perception that fulfill the requirements of cognitive penetration.

The concept of mental imagery has been controversial, but we want to use a fairly non-demanding characterization, going back to Kosslyn, Behrmann, and Jeannerod: "Visual mental imagery is "seeing" in the absence of the appropriate immediate sensory input, auditory mental imagery is "hearing" in the absence of the immediate sensory input, and so on. Imagery is distinct from perception, which is the registration of physically present stimuli." (Kosslyn et al., 1995, p. 1335). This is the sense in which contemporary psychology and neuroscience (but not philosophy) talks about mental imagery. Just one example from a recent review article: "We use the term "mental imagery" to refer to representations [...] of sensory information without a direct external stimulus" (Pearson et al., 2015). We can summarize this concept as "perceptual processing that is not triggered by corresponding sensory stimulation in the relevant sensemodality" (Nanay, 2016).

Note that mental imagery, understood this way does not have to be voluntary, it is often involuntary (in flash-backs or in the case of earworms). It does not have to be conscious either (if sensory stimulation-driven perceptual processing can be unconscious, then so can perceptual processing that is not triggered by corresponding sensory stimulation). And while it is typically driven by top-down information, it can also be triggered laterally (by information in another sense modality) or in a bottom-up manner (as in the case of the blind spot, where the information is provided by the regions of the retina around the blind spot). It is also important to note that by "perceptual processing" what is meant in these definitions is "early cortical processing"—in the case of the visual sense modality, for example, we have early cortical activation in the primary visual cortex that does not correspond to the retinal activation.

Pre-cueing studies could be interpreted in this theoretical framework as instances of mental imagery: Pre-cueing induces early perceptual processing (as early as V1) that is not triggered by corresponding sensory stimulation in the relevant sense modality (that is, by corresponding retinal activation). In other words, given the definition of mental imagery above, pre-cueing induces mental imagery of the pre-cued feature. This is true of pre-cueing for a number of features, such as shape, color, and motion (see Shibata et al., 2008 for a good summary, see also Zhuang and Papathomas, 2011).

Mental imagery interacts with the perceptual processing of stimuli at all relevant stages of the perceptual hierarchy, starting with the earliest one. Early cortical processing of presented stimulus during mental imagery leads to a mixed imagery/perception state, where the activation of the V1, for example, is partially determined by the visual stimulus and partly by mental imagery. This is the clearest in the studies of illusory contours, where the early perceptual processing of illusory contours (in V1 and V2) is a mixture of amodal completion (which comes out as mental imagery according to our definition) and stimulus-driven processes (Kovács et al., 1995; Sugita, 1999; Bakin et al., 2000; Lee and Nguyen, 2001; Komatsu, 2006; Hedgé et al., 2008; Lommertzen et al., 2009; Vrins et al., 2009; Nanay, 2010a; Smith and Muckli, 2010; Bushnell et al., 2011; Shibata et al., 2011; Lee et al., 2012; Pan et al., 2012; Ban et al., 2013; Emmanouil and Ro, 2014; Hazenberg et al., 2014; Scherzer and Ekroll, 2015).

Some instances of amodal completion may be fully bottom-up driven, like the completion of simple shapes purely on the basis of Gestalt forms (that can go against our best judgments). But some other times, amodal completion is driven in a top-down manner, for example, in the case of seeing the cat behind the picket fence. Depending on what cats I encountered before, the way I complete this figure would be very different. The same goes for the amodal completion of letters and words.

One experimentally controlled study of top-down driven amodal completion (that is, mental imagery according to the definition above) and the way it interacts with perception comes from studies of how we perceive two-tone pictures before and after information is given about what the picture is of Teufel et al. (2015). Here, the mental imagery we use to complete the illusory contours very much depends on top-down information and this influences very early (V1) perceptual processing.

Because of the multiple and very early interactions between the perceptual processing of stimuli and mental imagery, mental imagery influences the way stimuli will get processed throughout

#### REFERENCES


perception (as opposed to exerting modulatory effects only at the input of early perceptual processing) thereby avoiding Pylyshynesque lines of objection. And given that most instances of mental imagery depend on content-driven top-down influences (Macpherson, 2012), this means that mental imagery can modulate perceptual computations in a direct, top-down, content sensitive manner.

This is our argument for the claim that pre-cueing studies show that perception is cognitively penetrated via mental imagery. It is important to be clear about the relation between attention and mental imagery here. We do not want to question the role of attention in pre-cueing studies. After all, it is attention that is being pre-cued. The pre-cue draws attention to certain features, which via top-down connections induces mental imagery for the pre-cued properties, which, then, after stimulus-presentation, interacts with and influences the online computations that process stimulus features. That is, what mediates the cognitive penetration of perceptual processing is not pre-cued attention, but cue-induced mental imagery.

## AUTHOR CONTRIBUTIONS

All authors listed, have made substantial, direct and intellectual contribution to the work, and approved it for publication.

## FUNDING

The study was supported by the Danish Council for Independent Research & FP7 Marie Curie Actions - COFUND DFF-Mobilex Mobility Grant 1321-00165, and the FWO Postdoctoral Fellowship 1.2.B39.14N (PF).


**Conflict of Interest Statement:** The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

Copyright © 2017 Fazekas and Nanay. This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) or licensor are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.

# Pre-cueing, Perceptual Learning and Cognitive Penetration

Dimitria Electra Gatzia1, 2 \* and Berit Brogaard3, 4

*<sup>1</sup> Department of Philosophy, University of Akron Wayne College, Akron, OH, USA, <sup>2</sup> Centre for Philosophical Psychology, University of Antwerp, Antwerp, Belgium, <sup>3</sup> Philosophy, University of Miami, Miami, FL, USA, <sup>4</sup> Department of Philosophy, University of Oslo, Oslo, Norway*

Keywords: cognitive penetration of perception, covert endogenous attention, perceptual learning, visual perception, cognition

In The Principles of Psychology, James (1981) suggested that attending to a stimulus can make it appear more "vivid and clear." Pre-cueing, the procedure in which a cue stimulus is presented to direct a subject's attention to the location of a test stimulus, has been used to test James' hypothesis (Posner, 1978, 1980; Carrasco et al., 2004, 2006; Yeshurun and Rashal, 2010; Carrasco, 2011; Kravitz and Behrmann, 2011). A recent debate concerns whether pre-cueing effects associated with covert attention involve cognitive penetration. In the context of information processing, cognitive penetration occurs when the information content of cognitive states directly influences perceptual computations in such a way as to alter their output<sup>1</sup> .

Attention is the process that either enhances the representation of relevant information (e.g., a scene at a certain location or an aspect of a visual scene) at the system level or diminishes the representation of irrelevant noise. Studies show that attention boosts the apparent stimulus contrast (Carrasco et al., 2004) and increases contrast sensitivity, which seems to be mediated by contrast grain—an effect akin to a change in the physical contrast stimulus (Carrasco et al., 2006). In addition, attention enhances spatial identification accuracy (Yeshurun and Rashal, 2010) and aids in the segmentation of the retinal image by increasing both first- and second-order sensitivity to the attended location (Barbot et al., 2012).

Attention can be allocated either by some bodily orientation of the organs by which an organism moves, say, when the eyes move in the direction of a target location (overt attention) or by shifting the direction of attention without reorienting the body, say, when attention is drawn by the salience of a cue while the eyes remain fixed (covert attention, see Findlay and Gilchrist, 2003). Cases that involve overt attention are not treated as cases of cognitive penetration because overt attention functions as a passive partition mechanism acting prior to the beginning of perceptual process (Pylyshyn, 1999; Macpherson, 2012; Deroy, 2013; Mole, 2015; Firestone and Scholl, 2016; Brogaard and Gatzia, 2017). Recently, however, it has been suggested that cases of covert attention are instances of cognitive penetration (Mole, 2015; Wu, in press) because covert attention functions as an active controlling influence of perceptual processing (Nanay, 2010; Carrasco, 2011, 2014).

Overt and covert attention can be exogenous or endogenous. Exogenous attention corresponds to a reflexive, involuntary response to a location upon the occurrence of a sudden or intense stimulation—it is oriented in a stimulus-driven or bottom-up manner (Carrasco, 2006, 2011). For example, when letters appear abruptly on a computer screen, they capture the eyes' attention and elicit faster responses than when they appear gradually (Yantis and Jonides, 1984; Jonides and Yantis, 1988). Endogenous attention corresponds to our ability to monitor stimulus information (typically based on a cue) voluntarily—it is oriented in a goal-driven or top-down manner (Carrasco, 2006, 2011). For example, we often prepare for an expected event by orienting attention

#### Edited by:

*Athanassios Raftopoulos, University of Cyprus, Cyprus*

#### Reviewed by:

*Kateryna Samoilova, California State University, Chico, USA*

> \*Correspondence: *Dimitria Electra Gatzia dg29@uakron.edu*

#### Specialty section:

*This article was submitted to Cognitive Science, a section of the journal Frontiers in Psychology*

Received: *17 October 2016* Accepted: *24 April 2017* Published: *10 May 2017*

#### Citation:

*Gatzia DE and Brogaard B (2017) Pre-cueing, Perceptual Learning and Cognitive Penetration. Front. Psychol. 8:739. doi: 10.3389/fpsyg.2017.00739*

<sup>1</sup> See, for example, Wu (in press): "If a cognitive system contains information R such that vision computes over R where this computation explains why the visual system yields O rather than some other output On, given visual input I, then cognition cognitively penetrates vision."

to a location or an object and to the time of the event (LaBerge, 1995). Covert endogenous attention is, therefore, the most relevant type in the debate about cognitive penetration.

Wu (in press) defines attention as selection for action (see also Wu, 2014) and argues that covert (endogenous) attention penetrates visual processing. The claim that action requires attention is supposed to follow from what Wu (2011) calls the "Many-Many Problem," which can be described as follows. A subject is confronted with multiple targets, say, a football or a soccer ball simultaneously, and has the option of either kicking the soccer ball with the right foot or the football with the left foot. While keeping her gaze fixed, she kicks the soccer ball with the right foot, even though she could have kicked the football with the left foot. The supposition that the subject's gaze remains fixed is needed to ensure that this is a case of covert attention (and not a case of overt attention, where cognition controls visual experience by controlling what gets into the mind via the eyes). According to Wu, a consequence of the selection for action view is that when a subject selects a target to guide response in a task (e.g., kick the soccer ball) that subject attends to a target.

It has been suggested that Wu's response to the Many-Many Problem is based on an oversimplified distinction between actions and reflexes (see Jennings and Nanay, 2016). Wu distinguishes between actions, which require a many-many mapping between stimulus and response (e.g., kicking the soccer ball with the right foot or kicking the football with the left foot) and pure reflexes, which exhibit only a one-one mapping between stimulus and response (Wu, 2011, p. 54). However, this "oneone mapping" claim can take either a weak or a strong modal reading: on the weak reading, each type of stimulus corresponds to one type of response; on the strong reading, the stimulus type necessitates a type of response (see Jennings and Nanay, 2016). Both readings are problematic. The weak reading is too weak to distinguish pure reflex from action (for example, ordering the same type of coffee every time you visit your favorite cafe would be a case of reflex in this reading). The strong reading is too strong to capture paradigmatic cases of pure reflex (for example, cases of the knee-jerk reflex where the stimulus is present without the triggering kick response) (Jennings and Nanay, 2016).

Leaving this issue aside, let us return to the question of whether covert endogenous attention penetrates perceptual processes. Clearly, not all cases of covert attention involve cognitive penetration. The consensus is that cases of ambiguous figures are not instances of cognitive penetration because what triggers the changes in the perceptual process is that "different image regions are selectively processed over others because such regions are attended differently in relatively peripheral ways" (Firestone and Scholl, 2016, p. 14; see also Raftopoulos, 2013). There, however, additional reasons for denying that covert endogenous attention penetrates perceptual processes. Specifically, it seems that pre-cueing effects associated with covert attention are similar to perceptual learning effects, despite the former having more transient effects on attention than the latter.

Perceptual learning involves relatively long-lasting (developmental or evolutionary) changes to an organism's neural circuitry associated with specific ecological factors, which typically improve its ability to respond to its environment (see Gibson and Gibson, 1955; Hall, 1991; Karni and Sagi, 1991; Fahle and Morgan, 1996; Bruce and Burton, 2002; Purves et al., 2015). Attentional weighting<sup>2</sup> involves relatively long-lasting changes to an organism's perceptual system that allow it to increase attention allocated to stimuli that have significance and decrease attention allocated to extraneous stimuli (Goldstone, 1998).

Covert endogenous attention modulates visual processes that occur early on in the processing hierarchy (as does covert exogenous, albeit via different mechanisms; for reviews see Carrasco, 2011 and Pinto et al., 2013) 3 . For example, covert endogenous attention has been found to boost luminance contrast (first-order information) and contrast sensitivity (second-order information; Carrasco et al., 2004; Barbot et al., 2012), enhance spatial resolution (Carrasco et al., 2006), and reduce crowding (Yeshurun and Rashal, 2010). However, the current evidence indicates that there are clear constraints on top-down influences at all levels of information processing (Knill and Richards, 1996; for a review see Teufel and Nanay, 2017).

Typically, high-level cognitive processing influences low-level perceptual processing by facilitating the integration of relevant information at a global level with sensory inputs (Brunner and Goodman, 1947; Knill and Richards, 1996). The ultimate aim of top-down influences is the optimization of information processing at the system level, which is largely attributable to the fact that organisms exploit the statistical properties of natural scenes (Simoncelli and Olshausen, 2001; Geisler, 2008; Purves et al., 2015). For example, when retinal image information is inadequate, the perceptual system relies more heavily on prior probability distributions of different possible environmental states and adjusts its estimates accordingly (Knill and Richards, 1996; Torralba, 2003).

Covert endogenous attention can adjust to optimize performance depending on task demands (Barbot et al., 2012; see also Lee and Schmidt, 2005). This flexibility, however, comes at a cost. Attentional highlighting of information occurs even when it has adverse behavioral effects. For example, a letter which was used consistently as the target in a detection task becomes a distractor (i.e., the stimulus to be ignored), it automatically captures attention (Shiffrin and Schneider, 1977). The converse effect, i.e., negative priming, also occurs. Subjects respond to targets that served as distractors in the past more slowly than newly introduced stimuli (Tipper, 1992) while the effect of such previous exposures can last upward of 2 weeks (Fox, 1995). These studies show that pre-cueing effects associated with covert attention are similar to perceptual learning effects associated with attentional weighting. The main difference is that the former have more transient effects than the latter. While some consider perceptual learning to be a special case of cognitive penetration,

<sup>2</sup>This is one of the four mechanisms of perceptual learning have been identified: attention weighting, imprinting, differentiation, and unitization (Goldstone, 1998). <sup>3</sup>Barbot et al. (2012) found that while the effects of exogenous attention are a function of the second-order spatial frequency content, endogenous attention affects second-order contrast sensitivity irrespective of the second-order spatial frequency content.

viz. diachronic cognitive penetration (Churchland, 1988; Cecchi, 2014), the general consensus is that perceptual learning does not involve cognitive penetration (Fodor, 1988; Raftopoulos, 2001; Arstila, 2016).

Thus far, we pre-supposed that covert endogenous attention differs significantly from covert endogenous attention. However, studies indicate that the oculomotor system (the part of the central nervous system having to do with eye movements) is activated wherever spatial attention is allocated. Specifically, it is activated not only during the endogenous direction of covert attention to a cued location (e.g., the soccer ball) but also when covert attention is directed to an uncued location (e.g., the football; Van der Stigchel and Theeuwes, 2007). Spatial attention plays a major role in natural vision in facilitating the accurate targeting of saccades. It also ensures seamless perceptual transitions between discrete glances, which are accomplished by focusing resources on the saccadic goal across multiple levels of processing (Zhao et al., 2012; Bachmann and Francis, 2016). Eye movement typically precedes a motor action by a fraction of a second (Land and Hayhoe, 2001). These findings suggest

## REFERENCES


that covert endogenous attention may involve attentional shifts, albeit less apparent than the shifts involved in overt attention. The differences in the outputs can thus be attributed to selectively attending to a different object or a different feature of the same object (see also Firestone and Scholl, 2016).

Taken together, the current evidence suggests that covert endogenous attention does not penetrate perceptual processes: the effects of covert attention can be attributed either to processes that resemble perceptual learning or attentional shifts. Of course, perceptual processes could be penetrated by cognitive states. The point is that the current evidence indicates that they not penetrated by covert attention.

### AUTHOR CONTRIBUTIONS

Both authors made an equal contribution to the paper.

#### ACKNOWLEDGMENTS

We would like to thank the reviewer for invaluable comments.


**Conflict of Interest Statement:** The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

Copyright © 2017 Gatzia and Brogaard. This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) or licensor are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.

# Pre-cueing, the Epistemic Role of Early Vision, and the Cognitive Impenetrability of Early Vision

#### Athanassios Raftopoulos\*

Department of Psychology, University of Cyprus, Nicosia, Cyprus

I have argued (Raftopoulos, 2009, 2014) that early vision is not directly affected by cognition since its processes do not draw on cognition as an informational resource; early vision processes do not operate over cognitive contents, which is the essence of the claim that perception is cognitively penetrated; early vision is cognitively impenetrable. Recently it has been argued that there are cognitive effects that affect early vision, such as the various pre-cueing effects guided by cognitively driven attention, which suggests that early vision is cognitively penetrated. In addition, since the signatures of these effects are found in early vision it seems that early vision is directly affected by cognition since its processes seem to use cognitive information. I defend the cognitive impenetrability of early vision in three steps. First, I discuss the problems the cognitively penetrability of perception causes for the epistemic role of perception in grounding perceptual beliefs. Second, I argue that whether a set of perceptual processes is cognitively penetrated hinges on whether there are cognitive effects that undermine the justificatory role of these processes in grounding empirical beliefs, and I examine the epistemic role of early vision. I argue, third, that the cognitive effects that act through pre-cueing do not undermine this role and, thus, do not render early vision cognitively penetrable. In addition, they do not entail that early vision uses cognitive information.

#### Edited by:

Narayanan Srinivasan, Allahabad University, India

#### Reviewed by:

Robert Eamon Briscoe, Ohio University, United States Snehlata Jaswal, L M Thapar School of Management, India

#### \*Correspondence:

Athanassios Raftopoulos raftop@ucy.ac.cy

#### Specialty section:

This article was submitted to Cognitive Science, a section of the journal Frontiers in Psychology

Received: 17 September 2016 Accepted: 26 June 2017 Published: 10 July 2017

#### Citation:

Raftopoulos A (2017) Pre-cueing, the Epistemic Role of Early Vision, and the Cognitive Impenetrability of Early Vision. Front. Psychol. 8:1156. doi: 10.3389/fpsyg.2017.01156 Keywords: cognitive penetration, early vision, pre-cueing effects, epistemic role of perception, attention

# INTRODUCTION

In previous work (Raftopoulos, 2009, 2014, 2015), I argued that a stage of visual processing, i.e., early vision, is not directly affected by cognition in that its processes do not receive any cognitive feedback in a way that would justify the view that early vision draws on cognition as an informational resource, or, to put it differently, that early vision processes operate over cognitive contents, which is the essence of the claim that perception is cognitively penetrated (see also Pylyshyn, 1999). Early vision, thus, is cognitively impenetrable. This thesis was based on neuroimaging and electrophysiological studies suggesting that cognitively driven attention directly affects perception only at the time scale of late vision that succeeds early vision.

It has been recently argued by philosophers (Cecchi, 2014; Ogilivie and Carruthers, 2015) and cognitive scientists (Vetter and Newen, 2014; Goldstone et al., 2015; Lupyan, 2015) that various pre-cueing attentional effects directly modulate early visual processing itself, in that the signatures of these effects are found within early vision, and since these effects involve cognition, early

vision is cognitively penetrated. Fazekas and Nanay (2017) note that if pre-cueing is construed as the expression of cognition driving attention, it would be easy for a defender of the cognitive penetrability of early vision to counteract that pre-cueing is an indirect cognitive effect that, as such, does not entail that early vision is cognitively penetrated. If, however, pre-cueing is seen as the result of the effects of mental imagery on early visual processes, then it entails that early vision is cognitively penetrated.

I defend the cognitive impenetrability of early vision, in three steps. First, I briefly state the thesis that early vision is cognitively impenetrable. In doing so, I explain what is early vision, and I also define in broad terms cognitively penetrability. In the second section, to assess the contention that pre-cueing effects entail the cognitive penetrability of early vision, I examine these effects and argue that they do not affect directly early vision since they so not influence its role in visual processing, to wit, retrieving information from the environment. In the last section, I argue that whether a class of cognitive effects on early vision should be deemed a case of cognitive penetrability hinges on whether these effects affect the epistemic role of early vision since all discussions in the philosophical literature concerning cognitive penetrability are interwoven with the epistemic repercussions of cognitive penetrability.

# EARLY VISION AND COGNITIVE PENETRABILITY

Early vision includes a feed forward sweep in which signals are transmitted bottom-up. In visual areas (from LGN to IT) the feed forward sweep lasts for about 100 ms. It also includes a stage at which lateral and recurrent processes that are restricted within the visual areas and do not involve cognitive signals occur. Recurrent processing in early visual areas starts at about 80–100 ms and culminates at about 120–150 ms. Lamme (2003) calls it local recurrent processing. The unconscious feed forward sweep extracts high-level information that may lead to categorization and results in some initial feature detection. Local recurrent processing produces further binding and segregation. Studies show that there are early feedback loops, say, from LGN or V1 to MT/V5 and then back to V1, where the recurrent signals engage V1's neurons to perform different tasks from those performed when V1 received feedforward signals from the LGN (Heinen et al., 2005; Plomp et al., 2015; Drewes et al., 2016).

I said that the feed forward sweep might lead to early categorization. Let me explain this. Familiarity may affect object classification (whether, for example, an image portrays an animal or a face), a process that occurs in short latencies (95–100 ms and 85–95 ms, respectively) (Kirchner and Thorpe, 2006; Liu et al., 2009; Crouzet et al., 2010). Researchers agree that the early classifications in the brain result from the FFS and do not involve cognitive information, nor do they require the activation of object memories. The brain areas involved are low-level visual areas (including the FEF, front eye fields) from V1 to no higher than V4 (Kirchner and Thorpe, 2006), or perhaps a bit more upstream to posterior IT (Peterson, 2003) and lateral occipital complex-LO (Grill-Spector et al., 1998).

The early effects of familiarity may be explained by invoking contextual associations (target-context spatial relationships) that are stored in early sensory areas to form unconscious perceptual memories (Chaumon et al., 2008), which, when activated from incoming signals that bear the same or similar target-context spatial relationships, modify the feed forward sweep of neural activity resulting in the facilitating effects. Thus, what is involved in the phenomenon are certain associations built in the early visual system that once activated speed up the feed forward sweep. This is clearly not a case of top-down cognitive effects on early visual processing.

The early effects may also be explained by appealing to configurations of properties of objects or scenes. Neurophysiological research (Grill-Spector et al., 1998, 2006), psychological research (Peterson, 2003), and computation modeling (Ullman et al., 2002) suggest that implicit associations representing fragments of objects and shapes, or "edge complexes," as opposed to whole objects and shapes, are stored in early visual areas. The associations that are built, through learning, in early visual circuits reflect the statistical distribution of properties in environmental scenes (Van Rullen and Thorpe, 2001; Delorme et al., 2004). The statistical differences in physical properties of different subsets of images are detected very early by the visual system before any top-down semantic involvement as is evidenced by the elicitation of an early deflection in the differential between animal-target and non-target ERP's at about 98 ms (in the occipital lobe) and 120 ms (in the frontal lobe). The low-cues could be retrieved very early in the visual system from a scene by analyzing the energy distribution across a set of orientation and spatial frequency-tuned channel (Torralba and Oliva, 2003). This suggests that the rapid image classification relies on low-level or intermediate-level cues (Ullman et al., 2002) that act diagnostically, allowing the visual system to predict the gist of the scene and classify images fast. These cues may be provided by coarse visual information, say by low-level spatial frequency information and the visual system does not have to rely on high-level fully integrated object representations in order to be able to classify rapidly visual scenes.

In Raftopoulos (2009), I argued that early vision processing is cognitively impenetrable because it is not affected directly by cognitively driven attention although attention may affect pre-early vision and post-early vision stages of visual perception. Specifically, cognitively driven spatial attention may determine where one focuses before the presentation of the stimulus and, thus, before the onset of early vision. Or, feature/object based attention may prepare (more about this when I discuss pre-cueing) the perceptual system to process some items in the visual scene faster and more effectively by setting up the values of some parameters of the rules governing the state transformations during perception but the processes themselves of early vision are not affected by attention; attention sets up, as it were, the initial conditions in the transformation equations but the equations themselves are not affected. Finally, attention affects perceptual processing during late vision, which is a post-early

vision perceptual stage. This entails that signal transmission during early vision is not affected by top-down signals produced in cognitive areas and is restricted within the visual areas of the brain. Thus, the processes of early vision do not use cognitive information as an information resource and this makes early vision cognitively impenetrable. Pylyshyn (1999) reaches the same conclusion using psychological and behavioral evidence, whereas Raftopoulos (2009) relied on neuropsychological and imaging studies.

The processes of early vision retrieve from the environment the information that will eventually allow perception of a visual scene with as much accuracy as possible. In order to do so, early vision gradually constructs representations of increasing complexity (from variations in light intensities it extracts edges, from edges blobs, from blobs it extracts two-dimensional surfaces, and from these it infers the 21/2 sketch). The representations formed in early vision comprise information about spatio-temporal and surface properties, the shape of the object as viewed by the perceiver, color, texture, orientation, motion, and affordances of objects, in addition to the representations of objects as bounded, solid entities that persist in space and time.

In the discussion thus far I have extensively used the term 'cognitive penetrability.' Let me say a few things about what this term means. The term is intended to cover the cognitive influences on perception such that the contents of cognitive states affect the contents of perceptual states through the causal interaction of the cognitive and perceptual states that carry these contents. It is unanimously agreed upon in the relevant literature (the reasons for this are beyond the scope of this paper) that this interaction, in order to signify cognitive penetrability, must be purely mental and should not involve any eye or bodily movements.

For Siegel (2011, 2013, 2016, p. 4), 'cognitive penetrability' covers all cases of influences on the contents of experience by prior mental states, including cognitive and emotive states, which causally affect the content of perception such that they influence how things look. Thus, cognitive penetrability occurs when the cognitive effects affect not the selection of the input but perceptual processing itself.

If visual experience is cognitively penetrable, then it is nomologically possible for two subjects (or for the same subject in different counterfactual circumstances, or at different times) to have visual experiences with different contents while seeing and attending to the same distal stimuli under the same external conditions, as a result of differences in other cognitive (including affective) states. (Siegel, 2011, p. 5–6).

This is a useful definition of cognitive penetrability because it incorporates the basic desiderata for a conception of cognitive penetrability that is philosophically interesting. It establishes that for cognitively penetrability to occur the same stimulus should being seen. This immediately excludes from being instances of cognitive penetrability various attentional shifts that change the incoming input. In the literature, these cases are unanimously considered not to be cases of cognitively penetrability. It also, much more controversially, excludes any attentional effects from entailing cognitively penetrability because, according to Siegel, they are merely selectional effects that determine the input; the various selection effects where attention selects the input are not cases of cognitively penetrability (Siegel, 2011, 2013, 2016). I think that Siegel is wrong to exclude attention as a potential source of cognitively penetrability since attentional selection effects do occur in late vision and render late vision cognitively penetrated but since I do not have the space to discuss this problem I will simply assume that when cognitively driven attention modulates perceptual processing, this process is cognitively penetrable. This assumption does not affect the main discussion since I argue that attention does not directly affect early vision in any case.

Siegel's view that CP occurs when cognitive states affect perceptual processing itself if conjoined with the thesis that when they do so the affected perceptual processes operate upon cognitive information, accords with one of Pylyshyn's (1999) constant themes on cognitive penetrability. This is the thesis that cognition affects perception so that perception and cognition could be deemed to be continuous if cognition causally influences perception directly, that is, if the perceptual processes operate upon the information contained in the affecting cognitive states.

# EARLY VISION AND PRE-CUEING

I have argued (Raftopoulos, 2009, 2014, 2015) that early vision is cognitively impenetrable because it is not directly affected by cognition since the processes of early vision do not use cognitive information, as they do not operate over the contents of any cognitive states. My arguments were based on empirical evidence showing that object/feature based attention and cognitively driven or endogenous spatial attention are delayed and affect the visual areas of the brain (from V1 to IT) after 150 ms post-stimulus, which means that their effects are felt in the visual areas of the brain after the time frame of early vision.

I have also argued that even though the ERP marker of spatial attention P1 is within the time frame of early vision, P1 is in effect the neuronal index of the effects of exogenous, bottom-up spatial attention and, thus, does not signify that early vision is CP. The P1 wave (a component of the ERP waveforms) is larger in amplitude for stimuli presented at the attended location than for stimuli presented at the unattended location. Since the difference is due to the attended location, it is reasonable to assume that the amplitude of the P1 wave is modulated by spatial attention. The effect begins 70–90 ms after stimulus onset, which means that it is clearly an early perceptual and not a post perceptual effect. Spatial selective attention increases the activation of the neural sites tuned to the selected loci. The effect is sensitive to stimulus factors such as contrast and position. It occurs before the identification of the stimuli and is insensitive to the identity of the stimuli. It is independent of the task-relevance of the stimulus, since it is observed for both targets and non-targets. It is also independent of the nature of the task, since it is observed for a variety of tasks ranging from passive viewing to active searching locations. The effect is also insensitive to the cognitive states of the observers (expectations, desires, beliefs, etc.). In that sense,

P1 is thought to be an exogenous sensory component elicited by the onset of a stimulus at the attended location.

Recently, philosophers (Cecchi, 2014; Ogilivie and Carruthers, 2015) and cognitive scientists (Vetter and Newen, 2014; Goldstone et al., 2015; Lupyan, 2015) argued against my view that early vision is cognitively impenetrable on the ground that there is empirical evidence suggesting that cognitively driven object/feature-based and spatial attention modulate perceptual processing during early vision. Many studies show that when subjects are instructed to attend to a certain location or attend for a certain object/feature to appear, the neuronal assemblies in the visual brain whose receptive fields are within the attended location, or the neuronal assemblies that encode the feature indicated by these instructions receive a boost in their activation as a result of these instructions and this boost occurs before the appearance of the stimulus. This means that cognitive effects affect perceptual processing from its inception, and, hence, they also affect early vision, rendering it cognitively penetrated.

Cognitive effects are involved in this process because the instructions determine attentional commands (wait for a red letter A to appear, or attend to the upper left part of the screen) to be carried out and these commands require that the subject understand them. When subjects are instructed that a red object will appear on a screen, they use their cognitive resources to understand the instruction and activate their knowledge concerning the color red by activating the neuronal assemblies in the cognitive centers of the brain that store this knowledge. The activation is spread top-down and increases the base-line activation of the neuronal assemblies in the visual areas of the brain that encode the color red. This is a typical example of a cognitively driven attentional effect. Such instructions function as cues directing attention and, since they are given before stimulus presentation, the experimental setting is called pre-cueing. Pre-cueing can occur by cues presented on a screen without any accompanied verbal instructions, as when an arrow 'up' appears on a screen. These cues generate attentional commands because the subject understands them and the ensuing attentional effects are cognitively driven.

The problem that pre-cueing effects seem to create for the thesis that early vision is cognitively impenetrable is created by the fact that pre-cueing seems to entail that the processes of early vision are directly affected by cognition in the sense that they operate over some cognitive information. This is a problem because, as we saw, many definitions of cognitive penetrability hinge on whether some perceptual process is directly affected by cognition; should any direct effects on early vision be found, this would entail that early vision is cognitively penetrated.

The cognitive effects on early vision that I discuss are the cases of pre-cueing effectuated by covert shifts of attention. I do not discuss the indirect cognitive effects consisting in shifts of cognitively driven overt attention because these effects are realized through eye- or body-movements and, thus, introduce an external factor in the causal chain by which cognition affects perception, and the existence of such an external factor is almost unanimously thought not to entail cognitive penetrability. Whenever viewers are instructed to attend to a certain location or a certain feature or object to appear, or when they implicitly or explicitly expect some object or feature to appear on a certain location or they expect a specific object or feature to appear on the screen, attention affects perception by modulating the internal on-goings biasing the base-line activation of the neurons that encode the expected stimulus or location. By being internal and not external, this sort of attentional effect is a viable candidate as a cause of cognitive penetrability of early vision.

A word of caution is needed first. I talked about instruction to attend to some location or object/feature, and about expectations that some space will be occupied or that some specific object/feature will appear on the screen and I continued to subsume both attention and expectation effects under the general heading of attentional effects. But expectation and attention are different. When someone expects something, they operate on, or express, information concerning the statistical distributions of objects and spaces in their environment; when expecting O to appear, one attributes an elevated probability to O's presence in one's environment. Attention, in contrast, is thought as a mechanism that allows one to focus on, or zoom on what is relevant for one's purposes. There is empirical evidence showing that the probability of stimulus occurrence and task-relevance are independently manipulated, suggesting that expectations are dissociated from feature-based attention (Kok et al., 2013, 2014). Thus, one should treat the effects of attention and expectation differently. One should note, however, two things. First, even if they are different in nature, their effect on the early visual circuits is the same as we shall shortly see. Second, this dissociation presupposes a conception of attention as some sort of mechanism that acts on information. As I have argued (Raftopoulos, 2009), however, attention is best viewed as the result of the biased competition among pieces of information along the visual circuits. The biases may involve top-down cognitive information and both prior expectations and attentional commands are such biases. If true, this would also explain why they act the same way on visual neurons and it would also allow one to treat them as the same sort of cognitive effect.

Studies of the effects of spatial attention cues presented to a viewer before stimulus presentation show early modulation of perceptual processing (Freiwald and Kanwisher, 2004; Reynolds and Chelazzi, 2004; Carrasco, 2011). Attending to a location may enhance the base-line activation of the neuronal assemblies tuned to the attended location in specialized extrastriate areas V2, V3, V3a, V4, and in parietal regions (Kastner and Ungerleider, 2000; Freiwald and Kanwisher, 2004; Heeger and Ress, 2004; Hopfinger et al., 2004) and in striate cortex V1 (Kastner et al., 1999). This phenomenon refers to the enhancement of the baseline activity of neurons at all levels in the visual cortex that are tuned to a location that is cued and thus this location is attended before the onset of any stimuli. It is called attentional modulation of spontaneous activity. The spontaneous firing rates of neurons are increased when attention is shifted toward the location of an upcoming stimulus before its presentation.

This cueing is thought to reflect the effects of the neural processes that occur in response to cues to orient attention to a specific location before the stimulus appears. Spatial attention enhances the sensitivity of the neurons tuned to the attended spatial location by improving the signal-to-noise ratio of the

neurons tuned to the attended location over the neurons with receptive fields outside the attended location that contribute only noise. This effect does not determine what viewers perceive in that location because by enhancing the responses of all neurons tuned to the attended location independent of the neurons' preferred stimuli keeps the differential responses of the neurons' unaltered and thus does not affect what is perceived. To put it differently, spatial attention determines the focus of the gaze but does not solve the gazing problem of attention. What is perceived depends on the relative activity of appropriate assemblies of neurons that selectively code the features of the stimulus compared to the activity of assemblies that do not code the features of the stimulus and contribute noise. Since the percept depends on the differential response of these assemblies, this effect of spatial attention by not evoking differential responses leaves the percept unchanged; it makes detection of the objects/features in the scene easier but does not determine the percept.

Evidence (Liu et al., 2007; Shibata et al., 2008; Carrasco, 2011; Wyart et al., 2012; Kok et al., 2013, 2014) also suggests that through pre-cueing of object features (instructing a subject to look at a screen for a red object, for example or when a subject expects a particular grating to appear) feature-based attention modulates pre-stimulus activity in the visual cortex. In fMRI experiments designed to examine the effects of feature attention to color and motion on the visual, frontal, and parietal areas, a cue appeared 1 s before the stimulus. The activity within the color sensitive visual areas and the motor sensitive visual areas was increased by attention to color and motion, respectively. This resulted in the relevant visual areas that encode color showing enhanced activation as early as 80 ms after stimulus presentation.

The effects of pre-stimulus feature attention or pre-cueing may act either as a preparatory activity to enhance stimulus-evoked potentials and, thus, the sensitivity to the cued feature, within feature sensitive areas, or they may act to modulate stimulus-locked transients suppressing neural noise. In either case, they make the detection of the target easier, less expensive, and faster. Thus, the preparatory activity that occurs through pre-cues that rely on feature/object based attention increases the base-line firing rate of the neurons preferring the attended stimulus that the participant is instructed to attend to or for which a cue is presented before the presentation of the stimulus. These effects are widespread from V1, V2 to upper levels of perceptual processing. Research suggests [see Montemayor and Haladjian (2015, p. 41–42), and Raftopoulos (2009, Chapter 2), for a list of the relevant research] that the objects in a visual scene are individuated and sometimes are categorized by early vision irrespective of whether they are targets or non-targets or are cued or not, which means that early vision retrieves the required information and individuates all objects in a visual scene, despite the modulation of the pre-stimulus activity due to object/feature-based pre-cueing.

Both effects of pre-cueing reflect a change in background neural activity. These effects are called anticipatory effects and are established prior to viewing the stimulus. In this sense, they do not modulate processing during stimulus viewing but they bias the process before it starts; they do not affect perceptual processing on-line. There are various interpretations of the effects of pre-cueing on the neural activity in the occipital areas of the brain. They may act so as to increase the base-line firing rates of the neurons that encode the pre-cued stimuli; these are cases of gain modulation. Alternatively, they may act so as to suppress noisy neural activity rather than to increase the activity of the neurons that encode the information contained in the pre-cueing signal (Murray et al., 2004; Hegde and Kersten, 2010). It may also be that a variety of mechanisms are available and which one is chosen depends on the task at hand, which means that attention can flexibly solicit different ways to modulate the activity of neurons so as to change visual representations at a cellular level and affect the functional properties of neurons (Gilbert and Li, 2013). In all these cases the net result is the same: anticipatory activity sharpens and optimizes the response properties of the affected neurons according to anticipated stimulus (and this happens independent of whether a stimulus is expected as more likely to appear, or attended to as more relevant to the viewer's purposes). As such, anticipatory effects do not emerge as part of perceptual competition and in this sense they are not intrinsic to perceptual processing (Nobre et al., 2012, p. 161), which is otherwise unaffected by top-down effects. During the feed forward processing (FFS) and LRP there are no top-down cognitive effects due to pre-cueing, which means that the perceptual processes are data-driven.

What pre-cueing does is to set up the values of some parameters of the transformation rules in feed forward processing. When they set the parameters of the transformation rules, the pre-cueing effects highlight some information in the visual scene, by enhancing the activation of the neurons that encode this information, but they do not create the proximal image or stimulus. What they essentially do is to modulate early perceptual filters; in this sense, they act "as a 'filter' that 'selects' the information for downstream processing, which may itself be impervious to cognitive influence" (Firestone and Scholl, 2016, p. 23–24). These parameters can be construed as the attentional parameters that weight the effect of sensory signals, as they are postulated in computational models of perceptual attention, such as the model of divisive normalization proposed by Lee and Maunsell (2009). Pre-cueing may increase the value of some parameter and decrease that of another and this results in some input being given priority in terms of subsequent processing but this does not mean that early vision does not retrieve all information in the visual scene.

The pre-cueing effects do not select which information is retrieved from the visual scene once the visual scene has been determined; all information from the visual scene is retrieved in parallel in early vision. In the case of spatial pre-cueing, the anticipatory effects do not determine the percept since pre-cueing enhances responses of all neurons tuned to the attended location independent of the neurons' preferred stimuli and keeps the differential responses of the neurons' unaltered. In the case of object/feature pre-cueing, although anticipatory effects enhance the activity of the neurons responding preferentially to the pre-cued object or feature increasing the likelihood that they will be selected eventually for further processing, early vision still retrieves in parallel information concerning all the objects

and features present in the visual scene so that these objects be individuated independently of whether they are targets or non-targets.

When attention is used on-line, that is, during visual processing, cognitively driven selective attentional control selects for further processing a specific feature or object in a visual scene by increasing the firing rates of neurons that have a stimulus-evoked response to a particular stimulus; in this case, top-down signals modulate perceptual processing during stimulus viewing. In pre-cueing, processing during stimulus viewing in early vision relies solely on bottom-up processing or top-down and lateral processing restricted within visual areas. This is different from the role of attentional control during visual processing that involves top-down attentional control of the perceptual input.

If pre-cueing does not affect the information retrieved from the visual scene, the relevant cognitive states involved do not affect the selection of the 'evidence' or the information against which hypotheses concerning object identity will be tested in late vision. It follows that pre-cueing and the various cognitive effects underlying it do not affect the epistemic role of early vision. As I will explain in the next section, pre-cueing does not entail the cognitive penetrability of early vision.

There is an additional question that needs to be answered. As I have said, in the literature, cognitive penetrability goes hand in hand with the thesis that cognition directly affects early vision in the sense that the processes of early vision use the cognitive contents of the penetrating cognitive states as an informational resource. The question, thus, is the following. Do the pre-cueing effects suggest that cognition affects directly early vision?

Since the cognitive states do not influence the retrieval of information from a visual scene, the cognitive states do not affect perceptual processing itself and, therefore their influence is not direct. This, however, needs arguing for. In view of the fact that the electrophysiological signatures of pre-cueing effects are found within the time frame of early vision, one must examine these electrophysiological signatures. One first response could be that they are carry-over effects of the initial enhanced activation of the relevant feature sensitive areas owing to pre-cueing, that is, of the anticipatory effect of pre-cueing. This means that the fact that they are found during early vision processing does not entail that the contents of the early vision states that participate in these processes are affected by cognitive information, or equivalently, that the processes of early vision operate over such cognitive contents. A way to express this is to say that even though pre-cueing effects set the attentional parameters that we discussed in the previous paragraphs and these parameters in turn affect perceptual processing, the pre-cueing effects act so as to set some initial values but they do not alter the equations that govern the state transformation in which the processing consists. It follows that pre-cueing does not affect the processes of early vision itself, and, thus, does not affect early vision directly.

This conclusion is reinforced by recent studies that examine the role of the FEF in pre-cueing. O'Shea et al. (2004) found early latencies for target/distractor discrimination tasks, as in their study the discrimination by FEF neurons was effective after 100–120 ms after stimulus onset. O'Shea et al. (2004, p. 1063) note that the early latencies discrepancy may be explained by the fact that the repetition of the same target/distractor combination likely resulted in feature priming across the 10 blocks of 80 trials in their experiment and such priming has been shown to produce earlier target discrimination peaks in monkey FEF. It follows that the early onset of target vs. non-target discrimination was likely the result of some sort of feature pre-cueing.

The effects of TMS (Transcranial Magnetic Stimulation) on FEF in relation to pre-cueing was studied by Taylor and Nobre (2007), who applied TMS to the right FEF during the spatial cueing period of a covert attentional task. They found that inducing activity in the right FEF with TMS during the cueing period of a rule-guided covert endogenous attentional orienting task modulated ERPs recorded over visual cortex, which suggests that the TMS applied to FEF altered functional processes related to perception and attention in the visual cortex.

The FEF TMS had a causal impact on visual activity measured with ERPs (Taylor and Nobre, 2007). The earliest effect of TMS was a sustained negative deflection, which became significant after the third TMS pulse, during the interval between the cue and the visual stimulus. This negativity remained until 200 ms after stimulus onset. The data were normalized to a pre-TMS baseline period to emphasize ERP shifts occurring after warning cue onset but before visual stimulus presentation. The normalization shows that this negativity remained present in the ERP until 200 ms after stimulus presentation, which means that this negativity can be interpreted as an effect on visual processing at the time of the attentional modulation of the ERP. In view of the fact that the attentional modulation of the occipital visual areas is delayed in time and occurs after 170 ms post-stimulus, one would expect that the TMS applied to FEF would affect the neuronal activity in early visual areas with a similar time delay, if the TMS effects on FEF affected on line visual processing by controlling top-down attention. Indeed, when Taylor and Nobre (2007) isolated the stimulus-evoked activity by using the peri-stimulus period as the baseline, ERPs differed significantly as a function of FEF TMS starting at 200 ms.

The study by Taylor and Nobre (2007) makes it clear that the TMS is affecting on-going visual cortical activity even prior to visual stimulation, and it is not just affecting the visual cortex's generation of an ERP. These results mean that


late vision but outside early vision. This last result is very important because it shows that this study does not establish any cognitive effects on early vision but only on late vision.

Another way, (suggested by Gross this volume) is that the attentional parameters in computations provide an example of how cognitive contents can be accessed and operated over without their role in the computation being appropriately inference-like, that is, without there being a logical, reason-giving, relation between the cognitive contents that issue the attentional commands that set the values of the attentional parameters and the contents of the perceptual states that participate in the affected perceptual process. This is important because one of the reasons adduced to support the thesis that early vision is cognitively impenetrable is that cognitive penetrability requires that the cognitive and the perceptual contents stand in a semantic, quasi-logical relation of the sort found in the way the premises of arguments provide reasons for their conclusion. Even though a computational transition might itself be deemed an inference, or inference-like, not all elements of the computation, the attentional parameters, for example, need be quasi-reason-giving. The attentional weights that in Lee and Maunsell's model are computationally relevant affect computations in a way that does not presuppose that the cognitive contents that set them actually stand in the appropriate reason-giving relation that cognitive penetrability requires.

# DOES PRE-CUEING ENTAIL THAT EARLY VISION IS COGNITIVELY PENETRATED?

In the previous section, I examined pre-cueing in detail. The conclusion drawn from that discussion was that pre-cueing does not affect early vision processing itself but acts on it only indirectly since they do not affect the role of early vision, which consists in retrieving information from the environment. The problem now is to decide whether pre-cueing, given its indirect nature, entails that early vision is cognitively penetrated.

To understand better what is at stake with the idea that perception is cognitively penetrable, and decide accordingly whether pre-cueing entails the cognitive penetrability of early vision, one should go back when the discussions about CP of perception started. Hanson (1958), Kuhn (1962), Churchland (1988) and others interpreted findings in psychology and neuropsychology as showing that cognitive states involving propositional/conceptual contents affect perception. This was used as a springboard to mount an attack on the received view in the philosophy of science that there is a theory neutral observational basis on which a rational choice for empirical adequacy between competing theories could be made; the reasonable stance to adopt is that when a theory passes empirical testing it is selected, whereas when a theory fails to pass the empirical tests is rejected in its current form. If perception is cognitively penetrabile, perception becomes theory laden (perception is theory laden if the perceptual processes that produce it are affected by some background theory) and the choice between two alternative and mutually exclusive theories cannot be based on empirical testing. The reason is that since the two theories belong to different paradigms (comprehensive conceptual frameworks) the observations being interpretations made under the influence of the two alternative frameworks differ across paradigms. It follows that there is not a common empirical basis on which the choice between the two theories could be based. From this ensues the incommensurability thesis that bars communication across paradigms; perceptions being modulated by theoretical commitments, the proponents of different paradigms perceive different worlds and assign different meanings to the same terms. This bars communication because there is no neutral basis on which to resolve matters of meaning.

Sellars (1956) sought to undermine one of the tenets of classical empiricism, to wit, the view that one could introspect perception independently of concepts and get to the world, which, thus, is revealed in its own guise without any conceptual influences. This 'given' can be used as a neutral basis on which to determine the adequacy of perceptual beliefs. Since the cognitive penetrability of perception undercuts the possibility of such a given, the justificatory role of perception is undermined.

The thread connecting these views is that perception cannot play the epistemological role assigned to it by empiricism because it does not provide a neutral ground on which to decide which of our cognitive schemes is true or false; to the extent that perception is cognitively penetrated, perception's role in grounding perceptual beliefs is undermined. The cognitive penetrability may affect the epistemic role of perception because it lessens the sensitivity of perception to the data, or because it introduces some sort of irrationality in perceptual processing.

The main motive, therefore, underlying discussions of cognitive penetrability was that cognitive penetrability was thought to undermine the epistemic role of perception in grounding perceptual beliefs, that is, to undermine the extent to which experience could justify some belief. It follows that a cognitive influence on perception is a case of cognitive penetrability if it undermines the epistemic role of perception. However, not all cases of cognitive penetrability undermine the epistemic role of perception and some may even benefit it. One should extend, thus, the definition of cognitive penetrability so that any cognitive influence that affects the epistemic role of perception should be deemed as a case of cognitive penetrability independent of whether it diminishes or enhances this role. This amounts to saying that cognitive influences on perception that do not affect its epistemic role should not be considered cases of cognitive penetrability.

Stokes (2015) argues that cognitive penetrability should be understood in terms of its consequences. This consequentialism captures what is important in all discussions of cognitive penetrability, namely, the consequences of cognitive penetrability for the epistemic role of perception, theory-ladenness of perception, rationality in science, constructivism, etc. According to Stokes, an adequate account of cognitive penetrability should describe a phenomenon (or a class of phenomena) that has implications for the rationality of science, the epistemic role

of perception, etc. Stokes (2015) calls this the consequentialist constraint on analyses of cognitive penetrability. Stokes proposes disjunctive consequentialism, according to which ψ is cognitively penetrated if and only if ψ is a cognitive-perceptual relation that entails consequences for the epistemic role of perception. It should be noted that even though the original considerations, to which Stokes points out, concerning the epistemic impact of cognitive penetrability presupposed that cognitive penetrability undermines the epistemic role of perception and, thus, that it has harmful effects, Stokes (2015, p. 88) notes that on certain occasions cognitive penetrability and the theory-ladenness it induces may be beneficial for perception rather than harmful. It follows that for Stokes, cognitive penetrability occurs when the cognition-perception relation that obtains affects the epistemic role of perception and not when this relation downgrades perception, a view with which I fully agree.

Thus, I concur with Stokes that to determine whether some causal influence on perception counts as cognitive penetrability one should examine the effects of these influences on the epistemic role of perception. I propose, however, that cognitive influences on perception count as cases of cognitive penetrability if they have an epistemic impact on the justificatory role of perception and not only when they undermine the epistemic role of perception.

This sets the following condition for cognitive penetrability, which I call the epistemic criterion for CP.

Epistemic Criterion for Cognitive Penetrability: If perception (or a stage of it) is cognitively influenced in a way that renders it unfit to play the role of a neutral epistemological basis, by vitiating its justificatory role in grounding perceptual beliefs, perception (or a stage of it) is cognitively penetrated. If perception (or a stage of it) is cognitively influenced in a way that does not affect its epistemic role in justifying perceptual beliefs, it is cognitively impenetrable.

For the purposes of this paper, I will run this definition in parallel with the more standard definition that we discussed before, namely that a cognitive effect on perception is a case of cognitive penetrability if it affects perceptual processing directly in the sense specified above. In this paper, I will not address in depth the problem of the relations between the two definitions, although the discussion in this paper suggests that the two definitions of cognitive penetrability are inextricably linked. The epistemic criterion for cognitive penetrability entails that to determine whether a perceptual stage is cognitively penetrated, one should examine whether there are cognitive or emotive influences on this stage that affect its epistemic role in grounding perceptual beliefs.

One might argue that the claim that cognitive penetrability occurs only when cognition affects the epistemic role of perception is too strong. Cognitive influences that do not affect the epistemic role of perception are still cognitive influences and, thus, should constitute cases of cognitive penetrability. The discussion in this subsection shows only that there are many types of cognitive penetrability, some of which affect the epistemic role of perception. The objection is on the right track with one caveat. Discussing cognitive penetration, one expects cognition to penetrate perception and one could argue that when it does, it necessarily affects the epistemic role of perception. When, for example, cognitively driven attention indirectly affects perception by selecting the input, this is not a real case of cognitive penetration. Be that as it may, one could cogently distinguish between cases of cognitive penetrability that affect the epistemic role of perception and cases of cognitive penetrability that do not. However, I explained above why philosophers take an interest only to those cases in which cognition affects the epistemic role of perception and dismiss the other cases as, philosophically speaking, uninteresting. In keeping with this almost unanimous stance among philosophers, I restrict the term 'cognitive penetrability' to those cases in which cognition affects the epistemic role of perception, while recognizing at the same time that from the viewpoint of cognitive science it may make better sense to include in the class of cognitive penetrability all cases in which cognition affects perception independent of the ensuing philosophical repercussions.

Let me say a few things about the impact of cognitive penetrability on the epistemic role of perception. It is intuitive to think that perceptual experience provides defeasible evidence, or warrant, or rational support, or grounds, for endorsing beliefs. It does so directly without any intermediate mental states just because it is perceptual experience. Perceiving p provides prima facie justification, i.e., rational support, for the proposition p. This thesis constitutes the core of the experientialist theories of perceptual justification (Ghijsen, 2016, p. 2). There are many views concerning the way perception justifies perceptual beliefs, which are roughly divided into two main categories; those that fall within internalism and those that fall within externalism. According to internalism, the justification of perceptual beliefs by perception is independent of truth-related factors. Externalists reject this thesis. The two camps differ on the way they interpret and account for the problems that cognitive penetrability engenders for the epistemic role of perception.

Within internalism, the most classical view of perceptual justification is called perceptual or phenomenal conservatism or dogmatism (Markie, 2005; Pryor, 2005; Huemer, 2007; McGrath, 2013a,b; Tucker, 2014), which holds that if it perceptually seems to S that p, then, thereby, S has prima facie perceptual justification for the proposition p. Having an experience with content p suffices to give S immediate (meaning that S does not have to believe anything else) prima facie justification for p. One of the motives underlying this view is the so-called transparency of perceptual experience; perceptual experience is transparent in that when someone attends to their perceptual experience, they attend to the objects and properties the experience presents to them as in their environment. The phenomenology of the experience presents to them the world as being a certain way. Since perceptual experience presents perceivers worldly states of affairs as in their environment it is rational that they take what their perceptual experience offers at face value and form, prima facie, the belief whose content corresponds to the phenomenal character of their experience.

The problem that cognitive penetrability poses for the epistemic role of perception is that it threatens the role of perception in justifying perceptual beliefs. If prior beliefs affect perceptual processing, this affects the justificatory role of

perception. It is arguable that if the belief that X is F causally affects perceptual processing of a visual scene in which an X is present and as a result a viewer has an experience with content "X is F" on which the viewer subsequently bases the belief that X is F, one has a right to suspect that the role of the prior belief in affecting the content of perception undermines the rational support for the perceptual belief; the belief is epistemically compromised. Siegel (2013, p. 702–703) calls the phenomenon of cognitive penetrability leading to epistemically compromised beliefs, the downgrade principle.

According to Siegel (2013, p. 707), when the cognitive penetrability of an experience epistemically downgrades the experience by diminishing its justificatory role, this happens because the experience is formed through an irrational process; it is the irrational etiology of the experience that epistemically downgrades it (Siegel, 2013, p. 699–700). The irrational etiology of experience makes it serve as a carrier for forms of influences on beliefs that are epistemically bad. The experiences that are generated through an irrational process, i.e., those that are causally affected by prior mental states in a way that diminishes their justificatory role, generate ill-formed beliefs on account of their etiology.

Not all forms of cognitive penetrability lead to epistemic downgrade. Familiarity, expertise, and perceptual learning in general facilitate rather than hinder the justificatory role of perception (Lyons, 2011; Siegel, 2011, 2013). These are cases in which prior perceptual knowledge changes the way a scene looks, which allegedly is a case of cognitive penetrability, by affecting the features in a visual scene that become salient and, thus, are selected for further processing; expertise and familiarity facilitate pop-out of certain patterns that allow or speed up object recognition. Some forms of CP are beneficial for the viewer in that they increase the viewer's sensitivity to the visual information in the environment. If cognitive penetrability downgrades experience because of the irrational etiology it introduces, then some forms of cognitive penetrability do not introduce an irrational etiology.

Siegel proposes that to determine whether the influence of a prior mental state on an experience on which another (token) mental state is based epistemically downgrades the experience one should do the following. One should find, first, a belief with the same content as that of the experience. Then, one should find an etiology for this belief that is psychologically similar to the etiology of the experience. If this belief with this specific etiology is doxastically unjustified, the experience has an irrational etiology and has its justificatory role diminished. In other words, one should ask whether the processes leading to the experience, and from there to the belief that is based on this experience, are of the kind that when their corresponding psychological processes that pertain to beliefs are applied to beliefs lead to well-founded or to ill-founded beliefs. As Siegel (2013, p. 717) writes, in view of the fact that it is difficult to define a checkered experience in terms of a sort of cognitive penetrability that is bad, one should rely on one's sense "of which processes lead to ill-founded beliefs, and of which etiologies of experience are structurally similar to those."

Lyons (2011, 2015) and Ghijsen (2016), among others, have argued that an inferential, internalistic account cannot explain adequately why cognitive penetrability downgrades perception or why some cases of CP downgrade perception while others do not. I think that the main problem with inferentialism is that it is very hard to defend the 'Analogy Thesis,' that is, the view that there is a structural analog between perceptual processes and discursive inferences (Pylyshyn calls them 'quasi-logical'), that is, the sort of inferences that are involved in drawing inferences from some premises in thought. If the analogy thesis holds, perception is a rational process of belief fixation and the inferences used in perception do not differ from the inferences used in thought, which are called discursive inferences. These inferences are distinguished from 'inferences' as understood by vision scientists according to whom any transformation of signals carrying information according to some rule is a form of inference.

It is contestable that there be either in early or late vision discursive inferences (Hatfield, 2009; Raftopoulos, 2011). This undermines the argument that some perceptual processes could be deemed with less rationality on account of their structural affinities to less rational discursive inferences. In this paper, I assume that only taking into consideration externalistic notions, such as the sensitivity of perception to the data, could one hope to achieve an adequate account of the role of cognitive penetrability in downgrading perception.

Siegel, in addition to the thesis that cognitive penetrability downgrades perception because it introduces an irrational etiology, also alludes or explicitly refer to the fact that CP downgrades perception because it diminishes the sensitivity of perception to the environmental data (Siegel, 2013, 2016). Therefore, the fact that in the rest of the paper the analysis of the cognitive effects on early vision hinges on whether they undermine the sensitivity of early vision to the data should not be opposed by Siegel.

Externalists hold the view that to understand the epistemic role of perception in grounding perceptual beliefs, one need invoke truth-related factors, such as the sensitivity of perception to the environmental data and the extent to which perception faithfully reflects the environment. Many externalists are sympathetic to the internalist view that even the person who suffers a cognitively penetrated experience has some prima facie reason to hold the corresponding belief. To argue that there is also some reason that the perceiver whose perceptual experience is not subject to cognitive penetrability has better or more evidence to believe the relevant proposition than the perceiver who has fallen prey to cognitive penetrability, the externalists must introduce a more fine-grained account of perceptual evidence that distinguishes between two layers of perceptual evidence and which should be based on truth-related factors.

The first layer must be shared both by the victim of cognitive penetrability and the non-victim, but the non-victim must also possess a second layer of evidence that the victim lacks and which puts the non-victim in a better epistemic position. The first layer of evidence shared by both perceivers must be independent of truth-related factors (since the victim to cognitive penetrability holds a false belief) and must depend only on the phenomenal

character of the perceptual experience because this phenomenal content is what the victim and the non-victim share. The first layer of evidence is called phenomenal evidence. The second layer should be sensitive to the fact that the non-victim holds a true belief, while the victim holds a false belief. This sort of evidence is called factive evidence Schellenberg (2013, 2014, 2016a,b). Brogaard (2013) makes roughly the same point by introducing the notion of evidence that grounds the percept, which is the sort of evidence that the perceiver whose perceptual experience is cognitively penetrated lacks, but the perceiver whose perceptual experience is cognitively impenetrable possesses.

For externalists, cognitive penetrability may be epistemically damaging because it creates insensitivity to the distal stimulus, and it may be epistemically beneficial if it increases this sensitivity. The insensitivity to the data can happen in two ways. Either the cognitive states affect perceptual processing whereby information is retrieved from the visual scene and shape the proximal stimulus (the proximal stimulus or image is the iconic information that is retrieved from a visual scene during early vision and is stored in visual circuits) in a way that it ceases to reflect the environment and reflects more one's conceptual states (Lyons, 2011, p. 301–302). This would be the case if cognition could affect early vision in a way that interfered with this information retrieval.

Or, cognitive penetrability may be epistemically damaging during late vision where hypotheses about the identity of object(s) in the visual scene are formed and tested against the information contained in the proximal image that is transmitted to late vision by early vision (whose output is the input to the processes of late vision) by selecting from the proximal image only confirming information and either ignoring or rejecting disconfirming information. Cognitive penetrability may also speed up object recognition during late vision by selecting faster the relevant information, while ignoring the irrelevant information, which is one of the ways perceptual learning may affect perception.

Wishful seeing, for example, leads through perception to unjustified beliefs because a viewer's beliefs influence her perception to such an extent that she may see that X is F independent of whether this is true or not. Cognitive penetrability downgrades perception because it makes a viewer's beliefs insufficiently dependent on her environment and bases them more on her prior mental state; this may make the viewer, simply put, see things that are not there. This is what 'the insensitivity to the data' amounts to. "If Jill's belief that Jack is angry makes her less sensitive to his actual mental state, i.e., less likely to get it right, then this is bad penetration; if it makes her more sensitive, then it's good" (Lyons, 2011, p. 301–302). Lyons goes on to argue that the insensitivity to the facts undermines the reliability of perception because it increases the probability that the ensuing perceptual belief will be false, and this is related to the details of how cognition affects perception; it is the nature of the penetration and not the penetrator that determines whether cognitive penetrability is bad.

I have argued that the epistemic criterion for cognitive penetrability entails that to determine whether a perceptual stage is cognitively penetrated one should examine whether there are cognitive influences on this stage that affect its epistemic role in grounding perceptual beliefs. To do that, one should delineate first the epistemic role of the perceptual stages.

The epistemic role of perception in grounding perceptual beliefs centers on, but is not exhausted in, the percept because it is the percept that ultimately grounds the perceptual belief whose content matches the content of the percept. The percept that O is F, is formed in late vision because it presupposes that the object and the features in a visual scene have been identified and this takes place in late vision. It follows that the onus of perceptual justification is on late vision; it is late vision that delivers the most important item in the justification process. The details of the processes by which late vision forms the percept have been discussed elsewhere (Raftopoulos, 2011). For the purposes of my arguments here suffices it to say that the epistemic role of late vision is affected by cognitive influences and, thus, late vision is cognitively penetrated.

The epistemic role of early vision is determined by the fact that early vision retrieves from the visual scene information that is fed to late vision and is used for the construction of the percept, in the formation of which the semantic information made available by cognition also plays a crucial role. The iconic information delivered by early vision (the proximal image or stimulus) provides the 'evidential' or support basis (should one wish to deny that perception adduces evidence) on which the various hypotheses concerning the identity of objects in the visual scene are formed and tested in late vision. Thus, the role of early vision is to retrieve from the environment the information that will be used by late vision in order for the distal objects in the visual scene to be identified. As I have argued (Raftopoulos, 2009), early vision delivers a structural description of the visual scene that contains information about the 3D shape as viewed from the perceiver, spatio-temporal and surface properties, color, texture, orientation, motion, and affordances of objects, in addition to the representations of objects as bounded, solid entities that persist in space and time.

The problem is to decide whether pre-cueing effects on perception entail that early vision is cognitively penetrable. To do so, one should examine them and determine whether they satisfy the epistemic criterion for cognitive penetrability, that is, whether pre-cueing effects influence the epistemic role of early vision. Since this epistemic role consists in providing late vision with iconic information that is retrieved from the environment, the epistemic role of early vision would be affected by pre-cueing if pre-cueing effects could influence the processes of information retrieval during early vision. If they could, they would affect the sensitivity of early vision to the environmental data and this would render early vision cognitively penetrable.

I claimed above that the pre-cueing effects do not affect the retrieval of information during early vision and, thus, do not influence the proximal image. Thus, the pre-cueing effects do not diminish the sensitivity of early vision to the distal data since all data in the visual scene are retrieved and find their way into the proximal image. This means, in turn, that the pre-cueing does not affect the information that early vision retrieves from a visual scene and is subsequently used in late vision as evidence or the testing ground for the various hypotheses concerning object

identity that are formed in late vision. The fact that early vision retrieves from the visual scene all the information that is there, despite the cognitive pre-cueing effects on it, means that the contribution of early vision to the epistemic role of perception is not affected by these cognitive effects and, thus, early vision is not the source of the epistemic downgrade of perception owing to cognitive penetrability. If and when such an etiology emerges, it is due exclusively to the cognitive penetrability of late vision and the way late vision functions. If this is correct, the indirect effects on early vision do not affect the epistemic role of perception; any epistemic effects come from late vision.

# CONCLUSION

If the cognitive states cannot affect the early visual processes that retrieve information, the information contained in the

# REFERENCES


states of early vision is information retrieved from a visual scene independent of any cognitive influences. It follows that one could not shape the evidence on the basis of which hypotheses concerning the identities of objects will be tested. The information retrieved from the visual scene reflects only the environment and the perceptual makeup of the viewer. In addition, pre-cueing effects do not affect the perceptual processes themselves and, thus, do not entail that early vision uses cognitive information, which means that they do not affect early vision directly; they are indirect effects similar to the shifts of overt attention.

### AUTHOR CONTRIBUTIONS

The author confirms being the sole contributor of this work and approved it for publication.



**Conflict of Interest Statement:** The author declares that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

Copyright © 2017 Raftopoulos. This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) or licensor are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.

# Prior Knowledge of Object Associations Shapes Attentional Templates and Information Acquisition

#### Rachel Wu<sup>1</sup> \* and Jiaying Zhao<sup>2</sup>

<sup>1</sup> Department of Psychology, University of California, Riverside, Riverside, CA, United States, <sup>2</sup> Department of Psychology and Institute for Resources, Environment and Sustainability, University of British Columbia, Vancouver, BC, Canada

Studies on attentional selection typically use unpredictable and meaningless stimuli, such as simple shapes and oriented lines. The assumption is that using these stimuli minimizes effects due to learning or prior knowledge, such that the task performance indexes a "pure" measure of the underlying cognitive ability. However, prior knowledge of the test stimuli and related stimuli acquired before or during the task impacts performance in meaningful ways. This mini review focuses on prior knowledge of object associations, because it is an important, yet often ignored, aspect of attentional selection. We first briefly review recent studies demonstrating that how objects are selected during visual search depends on the participant's prior experience with other objects associated with the target. These effects appear with both task-relevant and task-irrelevant knowledge. We then review how existing object associations may influence subsequent learning of new information, which is both a driver and a consequence of selection processes. These insights highlight the importance of one aspect of prior knowledge for attentional selection and information acquisition. We briefly discuss how this work with young adults may inform other age groups throughout the lifespan, as learners gradually increase their prior knowledge. Importantly, these insights have implications for developing more accurate measurements of cognitive abilities.

Keywords: attentional selection, visual search, prior knowledge, statistical learning, categorization

# INTRODUCTION

Studies on attentional selection have typically used unpredictable and meaningless stimuli, such as simple shapes and oriented lines. Even if simple and somewhat meaningful stimuli, such as letters and numerals, are used in a task, different features of the stimuli like color and shape are randomized from trial to trial. The aim of using these unpredictable and minimally meaningful stimuli is to reduce effects due to learning or prior knowledge of the stimuli to obtain a "pure" measure of the underlying cognitive abilities (see Lupyan and Spivey, 2008; Brady et al., 2016). These studies have been instrumental in identifying critical aspects of attentional selection, including the timing and the process of attentional selection. Performance on these tasks, as well as other tasks measuring other cognitive abilities such as working memory, executive function, and inhibition, has been used to determine the range for healthy cognitive development and aging (see Park and Reuter-Lorenz, 2009). These tasks are also used to detect cognitive impairments in

#### Edited by:

Gary Lupyan, University of Wisconsin-Madison, United States

#### Reviewed by:

Christian N. L. Olivers, VU University Amsterdam, Netherlands Neil Gerald Muggleton, National Central University, Taiwan

> \*Correspondence: Rachel Wu rachel.wu@ucr.edu

#### Specialty section:

This article was submitted to Cognitive Science, a section of the journal Frontiers in Psychology

Received: 10 November 2016 Accepted: 08 May 2017 Published: 23 May 2017

#### Citation:

Wu R and Zhao J (2017) Prior Knowledge of Object Associations Shapes Attentional Templates and Information Acquisition. Front. Psychol. 8:843. doi: 10.3389/fpsyg.2017.00843

aging adults (e.g., Possin et al., 2013). Moreover, cognitive training programs have been using these modularized tasks to improve specific cognitive abilities that they were designed to measure (e.g., Ball et al., 2002; Olesen et al., 2004; Jaeggi et al., 2008).

However, it is difficult, if not impossible, to measure pure cognitive abilities in a single task, because the measurement depends on the participant's experience with the stimuli and related stimuli prior to the experiment, as well as during the task. Stimuli that are meaningless to the experimenter may not actually be meaningless to the participant, and unpredictable stimuli may disrupt the participant's existing cognitive models of a generally predictable environment (e.g., Orhan and Jacobs, 2014). In order to obtain more accurate and ecologically valid measures of cognitive abilities, it is important to investigate the influence of prior knowledge, which impacts performance on cognitive tasks in meaningful ways (e.g., Lupyan, 2008; Lupyan and Spivey, 2008; Orhan and Jacobs, 2014; Brady et al., 2016).

This mini review focuses on prior knowledge of associations between individual objects, because it is an important, yet often ignored, aspect of attentional selection. Objects in the natural environment rarely appear on their own. Instead, they almost always appear with other objects. Therefore, it is important to understand how these associations influence attentional selection and subsequent cognitive processes, namely information acquisition. This paper first reviews recent studies showing that the participant's prior knowledge of objects associated with the search target can either facilitate or hinder search, due to the use and construction of attentional templates. Then, we review studies showing how prior knowledge of associations determines what and how new information is learned, which results from intermediary selection processes. We end with a brief discussion on how this work can inform research with other age groups throughout the lifespan, and aid in developing more accurate measurements of cognitive abilities.

# PRIOR KNOWLEDGE OF TASK-RELEVANT OBJECT ASSOCIATIONS SHAPES ATTENTIONAL TEMPLATES DURING VISUAL SEARCH

Recent studies have shown that prior knowledge of object associations shapes how people search for information in the environment. Top-down, or goal-directed, search has been theorized to unfold in the following manner: The participant creates an attentional template, a prioritized working memory representation, of the to-be-searched target item, and then matches the attentional template to the stimuli presented (e.g., Olivers et al., 2011). Without an attentional template, top-down search is inefficient, perhaps impossible. Attentional templates can contain a single feature, a combination of features, a rule, or even a category (Luck and Hillyard, 1994; Eimer, 1996; Nako et al., 2014a; Wu et al., 2016). In many studies investigating attentional selection, the stimuli are simple objects, rather than complex naturalistic objects, in order to minimize the interference of prior knowledge with the visual search process, and to allow an investigation of the "pure" attentional mechanisms and parameters of attentional selection (e.g., Treisman, 1982; Wolfe, 1998; Woodman and Luck, 1999).

Building on the theoretical foundations of top-down search using simple meaningless stimuli, several recent behavioral studies have demonstrated that prior knowledge of object associations indeed impacts attentional templates and search efficiency (Yang and Zelinsky, 2009; Wolfe et al., 2011). For example, during visual search, participants could recall and recognize objects associated with the target more accurately than unrelated distractors (Moores et al., 2003). Distractors in the same color as the target in the natural environment slowed visual search for the target in the laboratory setting, even if the target was grayscale (Olivers et al., 2011). After knowing the target (e.g., banana), participants were slower to orient toward semantically related objects (e.g., monkey) compared to visually related objects (e.g., canoe), demonstrating that semantic biases can be a distraction when task-irrelevant (De Groot et al., 2016). In essence, prior knowledge has benefits and costs on visual search.

Recent ERP studies using the N2pc component suggest that these behavioral benefits and costs may be due to grouping of associated objects into one unit (e.g., a category; Nako et al., 2014a; Wu et al., 2016). When controlling for factors such as salience, the N2pc ERP reflects the number of attentional templates used during a search task (Nako et al., 2014a). Therefore, the N2pc is a useful tool for investigating the grouping of associated objects into an attentional template. In Nako et al. (2014a), participants searched for a letter target among three simultaneously presented distractors from a number category (and vice versa). ERP results revealed that such category search produced similar N2pc components compared to searching for a specific letter target among distractors from the same letter category. In other words, searching for associated objects in one category is similar to searching for a specific object. This finding has been replicated with naturalistic and artificial categories, such as clothing and kitchen items, human and ape faces (Wu et al., 2015), and newly learned Chinese characters and alien families (Wu et al., 2013, 2016). Prior knowledge of object associations also induces costs when distractors are thought to be in the same category as the target or semantically related to the target (Telling et al., 2009; Nako et al., 2014b). For example, when asked to search for the letter "A", but the letter "R" which is a distractor from the same category as the target appears instead, the participant tends to select the distractor prior to indicating that the target is absent. In these cases, prior knowledge encourages false alarms to distractors related to the target, resulting in poorer behavioral performance when indicating the absence of the target.

These visual search studies dovetail with an increasing body of research on working memory capacity showing that prior knowledge of object and feature associations allows grouping or "compression" of information to overcome memory capacity limitations (e.g., Brady and Alvarez, 2009; Orhan and Jacobs, 2014; Brady et al., 2016; Zhao and Yu, 2016). Costs of prior knowledge emerge when experimental conditions deviate from the statistics in the familiar environment in which the knowledge

was first acquired (Green et al., 2010; Orhan and Jacobs, 2014; Blanco et al., 2016). For example, Orhan and Jacobs (2014) argue that using unpredictable stimuli, such as shapes that do not predict color, may induce a "model mismatch" between the current stimuli and the participant's prior experience in the natural environment, where objects and features are often predictive (e.g., bananas tend to be yellow). This mismatch may negatively impact the participant's responses when completing the task. Relying on prior knowledge allows people to be more efficient in familiar environments, at the cost of being less efficient in novel environments that encompass different constraints.

# PRIOR KNOWLEDGE OF OBJECT ASSOCIATIONS SHAPES THE CONSTRUCTION OF ATTENTIONAL TEMPLATES

Prior knowledge induces costs and benefits on attentional selection because it dictates what is included in search templates. This tradeoff due to prior knowledge is the focus of some recent studies investigating how these costs and benefits emerge with learning and experience. The vast majority of visual search studies provide participants with explicit instructions about the target and sometimes the distractors, and assume that the participant creates an attentional template identical to the target shown, or at least containing the relevant features. However, with more complex stimuli containing many features and more ambiguous instructions, what is considered "relevant" can depend heavily on prior and newly acquired knowledge. This notion is consequential because the construction and use of relevant information for attentional templates typically determine search performance, and everyday activities do not often include simple meaningless stimuli or explicit instructions for every action.

Recent studies suggest that the amount of knowledge about object associations acquired prior to and during a task can determine how attentional templates are constructed. For example, newly acquired categories may be more difficult to find initially, but they elicit fewer false alarms compared to highly familiar categories, such as letters and numerals (Wu et al., 2013, 2016). Unfamiliar categories require learning to construct appropriate attentional templates, which may require learning new rules that may be based on seemingly arbitrary principles (e.g., Chinese characters for numbers, Wu et al., 2013). Therefore, search for newly learned categories may be initially inefficient. As the observer becomes more familiar with the categories, false alarms to the non-target items from the target's category may increase (Wu et al., 2017a). These studies also showed that probabilistic information of object associations can be used to determine which features and objects to prioritize in the attentional template. Such information includes the likelihood of the co-occurrence of objects in a category, which can be used to "chunk" many objects into a unified template (Wu et al., 2013, 2016). For example, Wu et al. (2013) presented Englishspeaking participants with pairs of objects that belonged in the same category (i.e., Chinese characters for numbers vs. nonnumbers), but were not explicitly told what the categories were. Participants implicitly extracted the category information based on the co-occurrence of the characters and formed a unified search template for the two categories of Chinese characters, albeit weaker templates than for familiar letters and numerals. These studies highlight that attentional templates are dynamic, task-specific, and dependent on prior knowledge, perhaps even with "minimally meaningful" stimuli such as letters (see also Nako et al., 2015).

# PRIOR KNOWLEDGE OF TASK-IRRELEVANT OBJECT ASSOCIATIONS IMPACTS SEARCH

Prior knowledge of object associations guides the spatial allocation of attention, even when completely task-irrelevant (e.g., Chun and Jiang, 1998; Zhao et al., 2013). In one study (Zhao et al., 2013), "meaningless" abstract novel symbols appeared one after another in a fixed, predictable order (i.e., with regularities), whereas other symbols appeared in a random order. While viewing these "task-irrelevant" objects, participants performed a visual search task where a target (i.e., the letter "T") appeared in either a structured location where objects appeared in a predictable order or a random location where objects appeared in a random order. Participants were faster to detect the target when it appeared in the structured location compared to the random location, suggesting that attention was biased toward the regularities of the object associations in the structured location. This attentional bias persists even when the regularities are later removed, or when new regularities emerge in a different location (Yu and Zhao, 2015). Moreover, depending on how objects co-occur in space, local and global regularities draw attention to local and global levels, respectively (Zhao and Luo, 2017). These studies demonstrate that the prior knowledge of object associations and co-occurrences biases attention to the spatial location containing regularities, possibly in order to facilitate further learning of regularities. This attentional bias can be both beneficial in allowing more learning to occur, and costly in terms of perhaps hindering learning of new information elsewhere.

# PRIOR KNOWLEDGE OF OBJECT ASSOCIATIONS DICTATES HOW NEW INFORMATION IS ACQUIRED

As both a consequence and a driver of the attentional selection process, prior knowledge of object associations can guide how new information is learned and created. Knowledge of object relationships can be acquired automatically and implicitly through statistical learning, which involves the extraction of reliable co-occurrences between individual objects over space and time (e.g., Fiser and Aslin, 2001; Turk-Browne et al., 2005). This ability is present in early infancy (Saffran et al., 1996; Fiser and Aslin, 2002; Kirkham et al., 2002; Wu et al., 2011), and perhaps even from birth (Teinonen et al., 2009). A notable

consequence of statistical learning is the generation of the knowledge that certain objects co-occur, and such knowledge is often implicit (Baker et al., 2004; Turk-Browne et al., 2005; Wu et al., 2011, 2013). This learning process occurs incidentally to the task without conscious intent, and can guide the spatial allocation of attention in a spontaneous, implicit, and persistent manner (Zhao et al., 2013; Yu and Zhao, 2015; Zhao and Luo, 2017).

Recent studies have demonstrated that prior knowledge of how objects are related to each other generates new knowledge of associations (Mole and Zhao, 2016; Luo and Zhao, 2016; Zhao and Yu, 2016). In Luo and Zhao (2016), participants were first exposed to a sequence of colored circle pairs, in which one circle appeared before another in a fixed order. For example, in the AB pair, A appeared before B, and in the BC pair, B appeared before C, where A, B, and C were circles of different colors. After learning the color circle pairs, participants automatically inferred new color pairs AC even though they never appeared together before. Both the prior knowledge and the newly acquired knowledge were implicit, in that no participant was explicitly aware of the pairs. Moreover, after acquiring the prior knowledge of one pair at one categorical level, participants implicitly inferred the same association at the subordinate level and the superordinate level, even if the subordinate or superordinate objects were never presented before. For example, after learning a city pair New York-Vancouver, participants could implicitly infer the corresponding park pair Central Park-Stanley Park, and the corresponding country pair United States-Canada. These results suggest that prior knowledge automatically generates new knowledge of object associations through transitive relations, even outside of explicit awareness. This study with young adults builds on infant studies demonstrating that prior knowledge of older regularities biases learning of new regularities (Marcus et al., 2007; Quinn and Bhatt, 2009; Lew-Williams and Saffran, 2012). Lew-Williams and Saffran (2012) exposed infants to disyllabic or trisyllabic nonsense words, and then a new set of disyllabic or trisyllabic nonsense words. Listening times showed that infants were able to learn words only when the words were uniformly disyllabic or trisyllabic throughout the entire experiment. Previous exposure to disyllabic words impaired the ability to learn trisyllabic words, and vice versa. Thus, prior knowledge about word length produces expectations that facilitate processing of future word information.

# USE OF PRIOR KNOWLEDGE ACROSS THE LIFESPAN

Most of the aforementioned studies on the influence of prior knowledge on attentional selection and information acquisition were conducted with young adults (18–30 years of age). Extending these investigations to other age groups across the lifespan, including infants and older adults, would provide a deeper understanding of how prior knowledge may have an increasingly impactful role in determining neural and behavioral outcomes with increased age and experience. One particularly challenging question in developmental psychology is how infants and children learn to engage in top-down goal-driven search to identify and learn about relevant events in the naturalistic, distraction-filled environment (Wu and Kirkham, 2010; Wu et al., 2011). Infants lack extensive prior knowledge due to their minimal exposure to the environment. Therefore, infants' attention is initially driven by stimulus salience (e.g., luminance and high contrast) and biases, such as orienting toward the T configuration resembling faces (e.g., Johnson et al., 1991; Colombo, 2001). Infants rely heavily on external input (e.g., distributional statistics, Aslin and Newport, 2012) to search for information and learn about and from cues in the environment (e.g., social cues, Wu and Kirkham, 2010). For example, infants first learn about regularities of social cues such as direction of gaze, and then use this learned attentional cue to learn about objects by 8 months of age (Wu and Kirkham, 2010; Wu et al., 2014). On the other end of the lifespan, more research is required to investigate a new explanation that the seemingly worse cognitive performance in older adults may not accurately reflect actual cognitive decline, but rather the knowledge acquired over a lifetime (Ramscar et al., 2014; Blanco et al., 2016; Wu et al., 2017b). Ramscar et al. (2014) posit that increased general knowledge may induce retrieval issues that resemble memory decline because the learner has to sift through more prior knowledge, compared to younger age groups, to retrieve a specific piece of information. Wu et al. (2017b) argue that extensive prior knowledge may reduce broad learning experiences, which are prevalent during infancy and childhood. This reduction may encourage an increased reliance on prior knowledge, which may be a key factor driving the effects of apparent cognitive decline in healthy aging adults.

# CONCLUSION

In conclusion, studies that are based on the traditional view of attentional selection being neutral to semantic content (e.g., attending to a spatial location or using simple, meaningless search stimuli) have laid the foundation for the nature and limitations of attentional selection in specific simplified contexts. Recent studies have shown that prior knowledge of object associations influences attentional selection, determines the contents in attentional templates, and subsequently shapes information acquisition in beneficial and costly ways. This point is often underappreciated in research on visual search, as well as other aspects of cognition. Tasks that measure cognitive health across the lifespan typically "remove" the participant's ability to use prior knowledge by using unpredictable meaningless stimuli. Given that prior knowledge can impact the efficiency in completing cognitive tasks, even when the knowledge is task-irrelevant, simple stimuli may underestimate or overestimate a participant's abilities. These insights highlight the importance of the acquisition of appropriate prior knowledge and its use in cognitive tasks both in the laboratory setting, as well as in the natural environment. More research on how

prior knowledge interacts with cognitive processes would lead to an increased emphasis on how the use of prior knowledge (e.g., for a search template) is optimized and how and when new information is acquired. Future research also could determine whether cognitive abilities should be conceptually separated from prior knowledge, for example as distinct layers of influence on performance, rather than inherently integrated (see Churchland et al., 1994). These efforts would lead to more accurate assessments of cognitive abilities and more effective training of these processes.

# REFERENCES


# AUTHOR CONTRIBUTIONS

All authors listed have contributed equally to the work, and approved it for publication.

# FUNDING

Preparation of this manuscript was funded in part by a grant to JZ (NSERC Discovery Grant (RGPIN-2014-05617).


variant frontotemporal and Alzheimer dementias. Neurology 80, 2180–2185. doi: 10.1212/WNL.0b013e318296e940


perceptual similarity despite inter-item dissimilarity. Atten. Percept. Psychophys. 78, 749–776. doi: 10.3758/s13414-015-1039-6


**Conflict of Interest Statement:** The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

Copyright © 2017 Wu and Zhao. This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) or licensor are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.

# The Effects of Spatial Endogenous Pre-cueing across Eccentricities

Jing Feng<sup>1</sup> \* and Ian Spence<sup>2</sup>

<sup>1</sup> Department of Psychology, North Carolina State University, Raleigh, NC, United States, <sup>2</sup> Department of Psychology, University of Toronto, Toronto, ON, Canada

Frequently, we use expectations about likely locations of a target to guide the allocation of our attention. Despite the importance of this attentional process in everyday tasks, examination of pre-cueing effects on attention, particularly endogenous pre-cueing effects, has been relatively little explored outside an eccentricity of 20◦ . Given the visual field has functional subdivisions that attentional processes can differ significantly among the foveal, perifoveal, and more peripheral areas, how endogenous pre-cues that carry spatial information of targets influence our allocation of attention across a large visual field (especially in the more peripheral areas) remains unclear. We present two experiments examining how the expectation of the location of the target shapes the distribution of attention across eccentricities in the visual field. We measured participants' ability to pick out a target among distractors in the visual field after the presentation of a highly valid cue indicating the size of the area in which the target was likely to occur, or the likely direction of the target (left or right side of the display). Our first experiment showed that participants had a higher target detection rate with faster responses, particularly at eccentricities of 20◦ and 30◦ . There was also a marginal advantage of pre-cueing effects when trials of the same size cue were blocked compared to when trials were mixed. Experiment 2 demonstrated a higher target detection rate when the target occurred at the cued direction. This pre-cueing effect was greater at larger eccentricities and with a longer cue-target interval. Our findings on the endogenous pre-cueing effects across a large visual area were summarized using a simple model to assist in conceptualizing the modifications of the distribution of attention over the visual field. We discuss our finding in light of cognitive penetration of perception, and highlight the importance of examining attentional process across a large area of the visual field.

Keywords: attentional visual field, pre-cueing, endogenous, covert attention, eccentricity

# INTRODUCTION

Expectation about likely locations of a target guides our attention and is essential to efficient identification of important information when interacting with a complex environment (Posner, 1980; Carrasco, 2014; Peelen and Kastner, 2014). For example, when searching for a friend whom one will pick up while driving, the driver allocates attention to the sides of the road rather than the middle of the street. Such expectation, sculpted by target familiarity, memory and scene context (Peelen and Kastner, 2014), is often studied in the laboratory setting using cues indicating the

#### Edited by:

Gary Lupyan, University of Wisconsin-Madison, United States

#### Reviewed by:

Britt Anderson, University of Waterloo, Canada Adrian Von Muhlenen, University of Warwick, United Kingdom

> \*Correspondence: Jing Feng

jing\_feng@ncsu.edu

#### Specialty section:

This article was submitted to Cognitive Science, a section of the journal Frontiers in Psychology

Received: 09 October 2016 Accepted: 15 May 2017 Published: 07 June 2017

#### Citation:

Feng J and Spence I (2017) The Effects of Spatial Endogenous Pre-cueing across Eccentricities. Front. Psychol. 8:888. doi: 10.3389/fpsyg.2017.00888

likely location of a target before its presentation. Studies have shown that expectations induced by pre-cues are powerful and operate at very early stages of processing, often even before the stimulus is present (Mangun et al., 1998; Giesbrecht et al., 2006). More specifically, a pre-cue that indicates target locations can enhance spatial resolution at these locations (Yeshurun and Carrasco, 1999), reduce the spatial extent of crowding (Yeshurun and Rashal, 2010), improve perceptual quality (Anderson and Druker, 2013), and affect attentional selection by enhancing the neural response at the locations thus biasing competition favorably toward stimuli at these locations (Kastner and Ungerleider, 2001). This enhancement, as measured by electroencephalogram (EEG) and single cell response, has been seen at both lower and higher levels of the visual cortex including V1 (Motter, 1993), V2 (Motter, 1993; Luck et al., 1997), and V4 (Motter, 1993; Connor et al., 1996; Connor et al., 1997). As a result, information at the expected locations is enhanced, as shown by improved accuracy and faster response speed in identifying a target (Posner, 1980; Carrasco and Yeshurun, 1998; Staugaard et al., 2016).

In the research of pre-cuing effects, several studies reported a significant role of stimulus eccentricity. In one study (Yeshurun and Carrasco, 1998), when attention was drawn by a cue that appeared at the target location before the stimuli display, the pre-cue improved participants' performance on a texture segregation task in the peripheral locations but impaired task performance at foveal and parafoveal locations (Yeshurun and Carrasco, 1998). Two other studies (Bao and Pöppel, 2007; Bao et al., 2013) examined inhibition of return, an attentional phenomenon that target identification is first enhanced but then impaired by a pre-cue that appeared before the target at the same location, finding significantly stronger inhibition of return in the periphery than in the foveal and perifoveal areas (up to 15◦ of eccentricity). Functional subdivisions across eccentricities ranging from the foveal to peripheral areas in the visual field have been speculated to be related to the inhomogeneity of the visual field at the physiological and neuroanatomical levels (for a review, see Strasburger et al., 2011), including cortical and subcortical mechanisms (Cowey and Rolls, 1974; Popovic and Sjöstrand, 2001; Bao and Pöppel, 2007; Bao et al., 2013).

Most studies that examined pre-cueing effects on spatial attention have presented stimuli within an eccentricity of around 20◦ . Very rarely, stimuli were presented outside this area [e.g., Bao et al. (2013) compared attentional processing at 7 ◦–21◦ ]. However, as visual processing starts to show an abrupt change at around 20◦ of eccentricity, we may not be able to use our understanding of visual attentional processing inside 20◦ of eccentricity to infer about the processing in more peripheral areas. For example, the velocity of a saccade with an amplitude of up to 20◦ increases linearly with the amplitude. However, when a saccade's amplitude goes beyond that, its velocity starts to plateau and the change becomes non-linear with the amplitude (Bahill et al., 1975). In addition, our ability to hold gaze stable also declines more quickly outside 20◦ (Bertolini et al., 2013). In daily lives, when we are free to move our heads, a shift of gaze larger than 20◦ is commonly accompanied by a head movement. Given the intense coupling between attention and saccades (Sheliga et al., 1994; Hoffman and Subramaniam, 1995), it is reasonable to speculate that attentional processing may change quite differently inside vs. outside 20◦ of eccentricity. Therefore, research on pre-cuing effects of spatial attention needs to expand more into the periphery.

There are two types of pre-cues influencing spatial attentional processing. An exogenous pre-cue occurs at a peripheral location and automatically attracts attention to the location; whereas an endogenous pre-cue appears centrally and indicates where attention should be allocated to. Although both exogenous and endogenous pre-cues affect early visual processing (e.g., Brefczynski and DeYoe, 1999; Gandhi et al., 1999; Pinsk et al., 2004), much evidence points to differential mechanisms of the two. Impacts from an exogenous pre-cue is stimulus-driven, involuntary, quick and transient; in contrast, impacts from an endogenous pre-cue is concept-driven, voluntary, slower but more sustained (e.g., Posner, 1980; Jonides, 1981; Nakayama and Mackeben, 1989; Ling and Carrasco, 2006). Attentional shifts associated with an exogenous pre-cue depends minimally on the distance; however, shifts associated with an endogenous pre-cue is significantly affected by distance (Chakravarthi and VanRullen, 2011). Compare to exogenous pre-cueing, endogenous pre-cueing involves more cognitive control, and could be a potential mechanism for cognitive penetration of perception. For example, attentional shifts to spatial locations implied by an endogenous cue depend on the validity of the cue (Sperling and Melchner, 1978; Mangun and Hillyard, 1990; Giordano et al., 2009). In addition, several recent studies have demonstrated that probabilities of targets' occurrence in various areas of the visual field could be learnt and guide the distribution of attention (Geng and Behrmann, 2002; Druker and Anderson, 2010). On the contrary, an exogenous pre-cue automatically attracts attention even when the cue is uninformative (Pestilli et al., 2007; Montagna et al., 2009; Yeshurun and Rashal, 2010). Previous studies that examined eccentricity effects on pre-cueing have dominantly used exogenous pre-cues. Therefore, investigation of endogenous pre-cues with the target occurring across a wide range of eccentricities (particularly beyond 20◦ ) is needed. Such exploration will provide valuable evidence for cognitive penetrability of visual perceptual processing (for a review, see Lupyan, 2015), and particularly on the impacts from endogenous spatial attention on early vision across a large area of the visual field.

In the present study, we used a task measuring the spatial distribution of attention across an extended area of the visual field (Attentional Visual Field Task; Spence et al., 2013; Feng and Spence, 2014). On this task, the distribution of attention across a large area of the visual field can be reflected by performance in target detection which decreases with the increase in target eccentricity (e.g., Feng and Spence, 2014; Feng et al., 2016). In the current study, we implemented endogenous cues that occurred before the stimulus displays, to examine the pre-cueing effects across a wide range of visual eccentricities (10◦ , 20◦ , and 30◦ ). To ensure that we were measuring early visual processing (within 120 ms after stimulus presentation; Raftopoulos, 2009; Raftopoulos and Zeimbekis, 2015), the stimuli were displayed very briefly and followed by a mask in the experiments (20 or

30 ms). Our study also focused on covert attention that is the orienting of attention without an eye movement, although covert attention usually contribute to a subsequent eye movement to the attended location (Peterson et al., 2004). In the experiments, we designed the tasks in which either an eye movement would not be very useful (Experiment 1 on modifying the size of the to-be-attended area), or the time interval between the onsets of a pre-cue and a target was too brief to allow the execution of an eye movement (which normally takes at least 200 ms; Johnson and Proctor, 2004). In this study, we investigated two types of endogenous pre-cues that provide location information of a target: (1) a pre-cue that indicated the size of the area in which a target is likely to occur (Experiment 1), and (2) a pre-cue that showed the direction of a target in the visual field (Experiment 2).

# EXPERIMENT 1

Experiments 1A and 1B used cues that indicated the size of the area in which a target was likely to occur. Participants were instructed to make use of the cue and respond both accurately and quickly on the Attentional Visual Field (AVF) task. The experimental procedures were approved by the University of Toronto Ethics Review Board.

# Experiment 1A

A cue was presented only once, at the beginning of each block of trials. Therefore, the size of the area indicated by the cue was identical through the entire block of trials. Because the expectation of the size of the area was formed at the beginning of each block, the participant did not need to adjust the expectation on every trial, thus minimizing the cognitive overhead.

#### Methods

#### **Participants**

Fifteen undergraduates at the University of Toronto (six males, nine females; age range: 18–22 years), participated for course credit. All participants reported normal or corrected-to-normal vision.

#### **Stimuli**

An AVF task was used to examine the distribution of attention. Before each block of trials, a cue indicating the likely size of the area containing the target was presented (**Figure 1A**). The cue to a small area was a dark-gray unfilled circle (2.2◦ × 2.2◦ ); to a medium area, the cue was two dark-gray concentric unfilled circles (3.6◦ × 3.6◦ ); and to a large area the cue was three dark-gray concentric unfilled circles (4.5◦ × 4.5◦ ). In each trial of the AVF task (**Figure 1B**), the stimuli were presented in a circular area (63.1◦ diameter) centered on a uniform lightgray screen. Each trial began with a centered, unfilled fixation square with a dark-gray border (3◦ × 3 ◦ ) presented for 800 ms. The stimulus display consisted of 23 identical distractors and one target, each uniquely localized at an eccentricity of 10◦ , 20◦ , or 30◦ in one of eight equally spaced directions. The location of the target was randomly selected on each trial, subject to the restriction that the target appeared an equal number of times in each possible location over the block of trials. The target was a dark-gray filled square (1.5◦ × 1.5◦ ) surrounded by an unfilled circle with a dark-gray circumference (3◦ × 3 ◦ ). The distractors squares were unfilled squares with dark-gray borders (3◦ × 3 ◦ ), identical to the fixation square. The stimulus display was presented for 30 ms, followed by a mask of randomly oriented dark-gray lines for 200 ms. Participants indicated the direction of the target after the mask disappeared. The next trial started 1000 ms after a response was made.

#### **Design**

The experimental design was a completely within-participant 3 × 3 repeated-measures design. Cued area size (small/medium/ large) was a block factor, and target eccentricity (10◦ /20◦ /30◦ ) was varied within each block. There were three blocks for each cued area size and the order presented in a counterbalanced order.

#### **Procedure**

Before the experiment, the meanings of the size cues were explained to participants: a small size cue (only one small circle) indicated that the target was likely to occur only at an eccentricity of 10◦ ; a medium size cue (two concentric circles) indicated that the target was likely to occur at an eccentricity of either 10◦ or 20◦ ; and a large size cue (three concentric circles) indicated that the target was likely to occur at any eccentricity: 10◦ , 20◦ , or 30◦ . Participants positioned their head on a chin rest at a distance of 35 cm from the display. The AVF task was programed in Microsoft Visual Studio C++ and administered on a PC for experiment in the lab. A practice session, consisting 36 trials was required to ensure that participants understood the task. The 36 practice trials were grouped into three blocks: 12 trials for each area size cue. In the experimental session, trials with the same size cue were blocked and repeated three times, for a total of nine blocks that were counterbalanced using Latin Square. There were 72 trials in each block with a large size cue. Because a large cue would be valid for a target appearing at any of the three eccentricities, every trial with a large size cue was valid. The number of trials in the block, 72, was a multiple of the 24 possible locations of the target. In contrast, there were 80 trials in each block with a medium or small cue because there were both valid and invalid trials for these cues (80% cue validity). A small or medium size cue would be invalid for a target appearing at an eccentricity outside the tobe-attended area (an eccentricity of 20◦ or 30◦ with a small size cue or an eccentricity of 30◦ with a medium size cue); therefore, blocks with medium or small size cues had both valid and invalid trials. Participants saw the size cue before each block and were asked to maintain the same expectation induced by this cue throughout the block. Participants were given a 2-min rest after each block. Responses from participants indicating the directions of the target on each trial and the response times were recorded.

#### Results

A 3 × 3 (cued size: small/medium/large, target eccentricity: 10◦ /20◦ /30◦ ) repeated-measures ANOVA was used to analyze

the percentage of correct responses and response time data. We calculated the percentage of correct responses and average response time based on all trials of each combination of conditions.

#### **Percentage correct**

Overall accuracy on target detection differed significantly among eccentricities (10◦ : 81%, 20◦ : 68%, 30◦ : 48%) (**Figure 1C**, left panel), F(2,28) = 60.19, p < 0.001. In particular, accuracy was higher at 10◦ than 20◦ , F(1,14) = 34.02, p < 0.001, and also higher at an eccentricity of 20◦ than 30◦ , F(1,14) = 51.75, p < 0.001. Overall accuracy varied with cue size (small cue: 62%, medium cue: 67%, large cue: 68%), F(2,28) = 7.05, p < 0.01. Subsequent analyses revealed a significantly difference between a small cue and a medium cue, F(1,14) = 12.95, p < 0.01, and between a small and a large cue, F(1,14) = 9.31, p < 0.01. There was a significant interaction between expected size and target eccentricity, F(2,86) = 15.41, p < 0.01. In particular, accuracy differed significantly among cued size at an eccentricity of 30◦ , F(2,28) = 14.79, p < 0.001, and at an eccentricity of 20◦ , F(2,28) = 3.77, p < 0.05, but not at an eccentricity of 10◦ , F(2,28) = 2.39, p = 0.11.

#### **Response time**

Response speed differed among eccentricities of the target (**Figure 1C**, right panel), F(2,28) = 6.27, p < 0.01. The interaction between cued size and target eccentricity was significant, F(4,56) = 2.68, p < 0.05. Slower responses were associated with lower accuracies (**Figure 1C**), suggesting that there was no speed-accuracy trade-off.

# Experiment 1B

Experiment 1A demonstrated that the attended area could be modified by a cue that indicated the likely eccentricity of the target. This experiment examined whether a cue that varied unpredictably would still be effective when presented before each trial rather than before the block of trials. Thus the time available to make use of the cue was much shorter. Otherwise, the task was identical to that in Experiment 1A. Since changing the size of the attended area takes processing time (Eriksen and St. James, 1986), the influence of the cue may not be as large as in Experiment 1A.

#### Methods

#### **Participants**

Fifteen undergraduates at the University of Toronto (5 males, 10 females; age range: 17–22 years) participated for course credit. All participants reported normal or corrected-to-normal vision.

#### **Task**

All settings were the same as in Experiment 1A, except the cue was presented after the fixation and before the stimulus on each trial (**Figure 2A**). Each cue was presented for 500 ms and followed by a 300 ms interval, before the onset of the stimulus display.

#### **Design**

This experiment adopted a completely within-participant 3 × 3 repeated-measures design. Factors include cued area size (small/medium/large) and target eccentricity (10◦ /20◦ /30◦ ). Trials of combinations of cued area size and target eccentricity were mixed and randomized.

#### **Procedure**

These were the same as in Experiment 1A, except the order of the trials was randomized. There were 36 randomized practice trials. In the experimental session, 720 trials were presented in random sequence with the target appearing 30 times in each of the 24 locations. Overall, 80% of the trials were valid, in which the cued size was equal to or larger than the eccentricity of presented target (cue validity was 100% for a large size cue, 88% for a medium size cue, and 53% for a small size cue). Participants were given a 2-min rest after each block of 120 trials. Responses from participants indicating the directions of the target on each trial and the response time were recorded.

#### Results

A 3 × 3 (cued size: small/medium/large, target eccentricity: 10◦ /20◦ /30◦ ) repeated-measures ANOVA was used to analyze the percentage of correct responses and response time data. We calculated the percentage of correct responses and average response time based on all trials of each combination of conditions.

#### **Percentage correct**

Target detection differed among eccentricities (10◦ : 82%, 20◦ : 70%, 30◦ : 49%) (**Figure 2B**, left panel), F(2,28) = 50.34, p < 0.01. Subsequent contrasts revealed that accuracy was higher at 10◦ than 20◦ , F(1,14) = 13.70, p < 0.001, and also higher at an eccentricity of 20◦ than 30◦ , F(1,14) = 69.19, p < 0.001. Varying the size of the cued area did not change overall performance (small cue: 66%, medium cue: 68%, large cue: 68%), F(2,28) = 0.40, p = 0.67; however, the interaction between cue and eccentricity was significant (**Figure 2B**, left panel), indicating that the distribution of attention was modified according to the cued area size, F(4,56) = 5.34, p < 0.01.

#### **Response time**

Response speed was different among eccentricities (10◦ : 429 ms, 20◦ : 484 ms, 30◦ : 580 ms) (**Figure 2B**, right panel), F(2,28) = 8.68, p = 0.001. The interaction between cue and eccentricity was also significant, F(4,56) = 4.68, p < 0.01. Slower responding was associated with lower accuracy (**Figure 2B**), suggesting that there was no speed-accuracy trade-off.

Given the pre-cueing effect was more visible in Experiments 1A than 1B, we conducted a statistical comparison of the pre-cueing effects in Experiments 1A and 1B using a 3 × 3 × 2 repeated-measure ANOVA on both accuracy and response time. Within-subject factors include cued area size (small/medium/large) and target eccentricity (10◦ /20◦ /30◦ ). Between-subject factor was cue style (blocked/mixed). There was a marginally significant interaction between cued area size and cue style on accuracy, F(2,56) = 3.06, p = 0.05, but not response time, F(2,56) = 0.50, p = 0.61.

We also investigated the difficulty of increasing, decreasing or keeping the to-be-attended area constant from one trial to the next using a single-factor (area change, three conditions: increasing, unchanging, and decreasing) repeated-measures

FIGURE 2 | (A) A sample trial of the AVF task with an area size cue in Experiment 1B. An endogenous cue indicating the size of the area in which the target was likely to appear was given immediately following the fixation in each trial. (B) Percentage correct (left panel) and response time (right panel) on the AVF task with an area size cue.

ANOVA. To allow opportunities to increase, decrease, and unchanged the cue size, only valid trials with medium size cues were included in this analysis (e.g., a trial with a large size cue would not be a result of a decrease in the attended area). A trial (with a medium size cue) preceded by a trial with a large cue was included in the decreasing condition: participants had to reduce the size of the to-beattended area from a large size in the previous trial to a medium size in the current trial. Similarly, a trial preceded by a trial with a medium cue was included in the unchanging condition, and a trial preceded by a trial with a small cue was included in the increasing condition. The percentages of correct among the three conditions were comparable (decreasing: 74%, unchanging: 74%, increasing: 76%), F(2,28) = 0.97, p = 0.39. In terms of response time, there was a trend of differential response speeds among the three conditions (decreasing: 551 ms, unchanging: 504 ms, increasing: 524 ms), F(2,28) = 3.08, p = 0.06. Compared to the unchanging condition, response was much slower when participants had to decrease the to-be-attended area, F(1,14) = 7.49, p < 0.05, but comparable when they had to increase the area, F(1,14) = 0.73, p = 0.41.

### Discussion

Our findings suggest that expectation modified the size of the attended area and hence attentional processing at eccentricities of 20◦ and 30◦ . These findings were in line with Titchener (1908), Eriksen and St. James (1986) that the distribution

of attention can be modified, according to the participant's expectation. For example, when highly focused on a primary central task, participants were less capable of noticing stimuli presented outside the area of the primary task (Ikeda and Takeuchi, 1975; Williams, 1995). Using the same size cue throughout a block of trials (Experiment 1A) was more effective (marginally) in modifying the distribution of attention than presenting a potentially different size cue on each trial (Experiment 1B). This is at variance with other experiments that used blocked trials (similar procedure to Experiment 1A), where no differences were observed as the result of manipulations of the spatial cues (Posner, 1978; Remington and Pierce, 1984). However, a similar experiment to Experiment 1B with randomized trials did show differences (Downing and Pinker, 1985). These discrepancies may lie in the nature of the tasks used in the studies. One possible reason is that in our experiment 1B, requiring participants to use a potentially different size cue on each trial must inevitably increase the cognitive workload and thus it is unsurprising that performance should suffer relative to the blocked trials of Experiment 1A. Another explanation is that a frequent shortterm repetition of target locations may lead to more significant cueing effects from learnt statistics. Walthew and Gilchrist (2006) found that while participants learnt and used statistics of the target location to guide their detection, the benefits of such endogenous cue were eliminated when short-term repetitions of target locations were restricted. It may be the case that in our Experiment 1, the repetitions of target locations were higher with the blocked design (Experiment 1A) than with the randomized trial design (Experiment 1B). But it is also interesting to consider our finding that the cueing benefits were greater at larger eccentricities despite the fact that the highest repetition would have occurred at the small size cue block (i.e., when the target frequently occurred at the eccentricity of 10◦ ). It is important to note that another study found that participants were capable of learning relatively complex statistical patterns (Druker and Anderson, 2010). Therefore, more complex patterns over trials may have also played a role in influencing our participants' allocation of attention in our experiment.

## EXPERIMENT 2

We examined the influence of a cue indicating the likely direction of the target. This directional cue was highly valid (67% in Experiment 2A and 80% in Experiment 2B) to encourage participants to use the cue (Jonides, 1981; Kröse and Julesz, 1989; Wright and Ward, 2008). The validities were convenient choices based on the number of conditions and repetitions in each experiment. Participants reported the direction (Experiment 2A) or identity (Experiment 2B) of the target. During covert orienting of attention, only the attentional focus but not the fixation is shifted. To help ensure that the participant maintained fixation during each trial, the duration from onset of the cue to offset of the stimulus was limited to 200 ms. Thus the likelihood of a saccade occurring during processing of the stimulus

# Experiment 2A

An arrow (67% valid) indicating the likely direction of the target was presented before the AVF stimulus appeared. Participants reported the direction of the target.

### Methods

#### **Participants**

Fourteen undergraduates at the University of Toronto (five males, nine females; age range: 18–21 years), participated for course credit. All participants reported normal or corrected-tonormal vision.

#### **Task**

This experiment used the AVF task with an endogenous cue (an arrow) indicating the likely direction of the target between the fixation and the stimulus display (**Figure 3A**).

#### **Design**

We used a 3 × 3 × 2 within-subject repeated-measures design, whose factors were target eccentricity (10◦ /20◦ /30◦ ), cue-target interval (0/50/100 ms) and target validity (valid/invalid).

#### **Stimuli**

The AVF task was very similar to the one used in Experiment 1B, except for the number of distractors, the cue, and the exposure settings. In this experiment, the directional cue, a dark-gray arrow (3◦ × 3 ◦ ), was presented at the center of the screen. The cue pointed in one of four directions (up, down, left, or right). During each trial, the cue remained on screen for 80 ms and was followed by a blank display (0, 50, or 100 ms) and then the stimulus, which consisted of 11 identical distractors and one target, each uniquely localized at an eccentricity of 10◦ , 20◦ , or 30◦ in one of four equally spaced directions (up, down, left, or right). The location of the target was randomly selected on each trial, subject to the restriction that the target appeared an equal number of times in each possible location. The stimulus display was presented for 20 ms, followed by a mask of randomly oriented lines for 200 ms. Participants indicated the direction of the target after the mask disappeared. The next trial started 800 ms after a response was made.

#### **Procedure**

Participants were required to position their head on a chin rest at a distance of 35 cm from the screen. A practice session of 24 trials was used to ensure that participants understood the task. In the experimental session, 648 trials were presented in a random sequence with the target appearing 54 times in each of the 12 locations for each cue-target interval condition. Sixtyseven percentage of the trials were valid (the cue arrow correctly pointed to the target). Participants were informed that the cue was valid in 67% of the trials, to encourage them to use the cue. In the invalid trials, the cue arrow pointed in one of the other three directions, at random with equal frequency. Participants

were given a 2-min rest after each set of 108 trials. Responses from participants indicating the directions of the target and the reaction times on each trial were recorded.

#### Results

A 3 × 3 × 2 (target eccentricity: 10◦ /20◦ /30◦ , cue-target interval: 0/50/100 ms, cue validity: valid/invalid) repeatedmeasures ANOVA was used to analyze the percentage of correct responses and response time data. We calculated the percentage of correct responses and average response time based on all trials of each combination of conditions.

#### **Percentage correct**

Target detection differed significantly among the three eccentricities (10◦ : 87%, 20◦ : 78%, 30◦ : 65%) (**Figure 2B**, three right panels), F(2,26) = 55.45, p < 0.001. Subsequent contrasts suggested that accuracy in target detection was higher at an eccentricity of 10◦ than 20◦ , F(1,13) = 74.08, p < 0.001, and also higher at an eccentricity of 20◦ than 30◦ , F(1,13) = 35.11, p < 0.001. The mean accuracy on the valid trials was higher than that on the invalid trials (valid: 87%, invalid: 66%) (**Figure 3B**), F(1,13) = 70.60, p < 0.001. The duration of the cue-target interval also had an impact on the performance. With varying intervals, overall accuracy (including both valid and invalid trials) differed significantly (0 ms: 77%, 50 ms: 77%, 100 ms: 75%), F(2,26) = 3.48, p < 0.05. Subsequent contrasts among individual conditions revealed a significant difference between 0 and 100 ms, F(1,13) = 9.87, p < 0.01. There was a significant interaction between validity and cue-target interval, F(2,26) = 3.91, p < 0.05. Subsequent analyses showed that accuracy on invalid trials varied significantly among intervals (0 ms: 68%, 50 ms: 66%, 100 ms: 63%) (**Figure 2B**, left panel), F(2,26) = 6.59, p < 0.01; whereas accuracy on valid trials did not differ among intervals (0 ms: 87%, 50 ms: 88%, 100 ms: 87%) (**Figure 3B**, left panel), F(2,26) = 0.14, p = 0.87. There was also a significant interaction between cue validity and target eccentricity, F(2,26) = 8.46, p = 0.001 (**Figure 3B**, three right panels). No other two-way or three-way interaction was significant.

#### **Response time**

Response speed differed among eccentricities (10◦ : 297 ms, 20◦ : 358 ms, 30◦ : 358 ms) (**Figure 3C**, three right panels), F(2,26) = 6.71, p < 0.01. Subsequent contrasts revealed faster response speed at 10◦ than 20◦ , F(1,13) = 25.01, p < 0.001, and faster speed at 10◦ than 30◦ , F(1,13) = 9.93, p < 0.01. Participants also responded faster in valid trials than in invalid trials (valid: 297 ms, invalid: 379 ms) (**Figure 3C**), F(1,13) = 47.91, p < 0.001. But the duration of the cue-target interval had no effect on response time (0 ms: 338 ms, 50 ms: 337 ms, 100 ms: 339 ms) (**Figure 3C**, left panel), F(2,26) = 0.01, p = 0.99. Overall, higher accuracy was associated with shorter RTs (**Figures 3B,C**), suggesting that there was no speed-accuracy trade-off.

#### Experiment 2B

Findings from Experiment 2A showed that the directional cue affected participants' performance on the AVF task. However, given that participants responded with the direction of the target, there may be a guessing bias as the cue on target direction was highly valid. When participants were not sure about the direction of the target, they may have responded with the cued direction. To eliminate the possible influence of guessing bias in the effect of directional cueing, we changed the task from reporting the direction of the target to reporting the identity of the target in Experiment 2B. The target in the AVF task was one of two visually distinct objects and participants reported which object had been presented. Because there was no relationship between the identity of the target and the cued direction, participants were not able to improve performance by guessing.

#### Methods

#### **Participants**

Twenty undergraduates at the University of Toronto (8 males, 12 females; age range: 17–23 years) participated for course credit. All participants reported normal or corrected-to-normal vision.

#### **Task**

The task was very similar to Experiment 2A except participants reported the identity of the target (the target was now bisected by either a horizontal or a vertical line) rather than the direction of the target (**Figure 4A**). Only two cue-target intervals (0/80 ms) were used to limit the number of trials.

#### **Design**

We used a 3 × 2 × 2 within-subject repeated-measures design, whose factors were target eccentricity (10◦ /20◦ /30◦ ), cue-target interval (0/80 ms) and target validity (valid/invalid).

#### **Stimuli**

All task settings were the same as in Experiment 2A, except the target was a filled dark-gray circle (2.2◦ × 2.2◦ ) with a darkgray line (0.8◦ × 3.6◦ ) bisecting the circle either horizontally or vertically. After a blank interval of 0 or 80 ms, the stimulus display appeared for 40 ms, followed by a mask of 200 ms duration. Participants responded after the mask disappeared.

#### **Procedure**

As in Experiment 2A, except that participants reported the identity of the target, pressing 'Z' for targets with the horizontal line and '/' for targets with the vertical line. Participants pressed 'Z' with the left hand, and '/' with the right hand. Participants were instructed to respond both accurately and quickly. The choices of target identity and the reaction times were recorded. After a practice session of 24 trials, participants completed 720 trials in a random sequence with the target appearing 30 times in each of the 12 locations, for each of the two cue-target interval conditions. Eighty percentage of the trials were valid.

#### Results

A 3 × 2 × 2 (target eccentricity: 10◦ /20◦ /30◦ , cue-target interval: 0/80 ms, cue validity: valid/invalid) repeated-measures ANOVA was used to analyze the percentage of correct responses and response time data. We calculated the percentage of correct responses and average response time based on all trials of each combination of conditions.

#### **Percentage correct**

fpsyg-08-00888 June 3, 2017 Time: 13:42 # 11

Accuracy in target detection varied among eccentricities (10◦ : 87%, 20◦ : 71%, 30◦ : 69%) (**Figure 4B**), F(2,38) = 68.53, p < 0.001. Subsequent contrasts revealed higher accuracy at 10◦ than 20◦ , F(1,19) = 111.37, p < 0.001, and higher accuracy at 10◦ than 30◦ , F(1,19) = 78.17, p < 0.001. Accuracy was higher on valid trials than invalid trials (valid: 77%, invalid: 74%), F(1,19) = 10.20, p < 0.01. There was a significant interaction between cue validity and cue-target interval, F(1,19) = 10.06, p < 0.01. Subsequent analyses showed that when the cue-target interval was 0 ms, there was little difference in accuracy between valid and invalid conditions (valid: 77%, invalid: 76%) (**Figure 4B**, left panel), F(1,19) = 1.40, p = 0.25. However, with an interval of 80 ms, the difference was significant (valid: 78%, invalid: 72%) (**Figure 4B**, right panel), F(1,19) = 18.88, p < 0.01. No other two-way or three-way interaction was significant.

#### **Response time**

Response speed was different among eccentricities (10◦ : 449 ms, 20◦ : 494 ms, 30◦ : 545 ms) (**Figure 4C**), F(2,38) = 16.62, p < 0.01. Response was much faster at 10◦ than 20◦ , F(1,19) = 22.60, p < 0.001, and faster at 20◦ than 30◦ , F(1,19) = 19.81, p < 0.001. Participants responded faster with valid cues than with invalid cues (valid: 463 ms, invalid: 529 ms), F(1,19) = 44.91, p < 0.01. When the cue-target interval was zero, the mean reaction time was shorter with valid cues than with invalid cues (valid: 476 ms, invalid: 534 ms) (**Figure 4C**, left panel), F(1,19) = 20.41, p < 0.01. With an interval of 80 ms, participants responded faster with valid cues than with invalid cues (valid: 450 ms, invalid: 525 ms) (**Figure 4C**, right panel), F(1,19) = 34.39, p < 0.01. Higher accuracies were generally accompanied by faster response times (**Figures 4B,C**), suggesting that there was no speed-accuracy trade-off.

# Discussion

Experiments 2A and 2B demonstrated that expectation of the direction of the target can change the distribution of attention. An increase of the cue-target interval produced a more pronounced effect with endogenous covert orienting of attention. This is evident in both Experiments 2A and 2B. Particularly, in Experiment 2B, as the cue-target interval increased from 0 to 80 ms, participants' percentage of correct responses became significantly higher when the target appeared in the expected direction (valid conditions) compared to when the target appeared in unexpected directions (invalid conditions). Presumably, with a longer cue-target interval, participants had more time to form their expectation of the likely direction (Posner, 1980; Shepherd and Müller, 1989). Thus the difference in accuracy between the valid and invalid conditions became greater. In both Experiments 2A and 2B, this greater difference with a prolonged interval was caused by further impairment in accuracy by an invalid cue, rather than by an enhancement in accuracy by a valid cue. This may imply that, when discrimination and identification of a target among distractors are necessary, expectation of the direction of the target improves performance mostly by inhibiting the unexpected directions between 80 and 180 ms (160 ms in Experiment 2B) following the onset of the

# GENERAL DISCUSSION

Two forms of expectations induced by spatial cues on attentional distribution were examined in the presented experiments. As predicted, the results suggested that a participant can control the size of the area in which attention is deployed, and can covertly orient attention in a particular direction. Modification of the distribution of attention is an efficient mechanism for enhancing attentional performance when the cue is valid. The cues used in the experiments were endogenous (i.e., directed participants' attention to the spatial location of the target). Endogenous covert distribution of attention induced by an expectation of the target is an efficient mechanism for enhancing attentional performance when the prediction is highly accurate. We measured covert attention that reflected the effect of precueing attention without an eye movement. Our experiments were designed to eliminate voluntary eye movements that could benefit task performance based on the cue. In Experiment 1, although the interval between the onsets of a cue and a target was long (e.g., 800 ms in Experiment 1B), an eye movement would not have been beneficial as a target could occur at any direction within the visual area. In Experiment 2, although an eye movement toward the cued direction could be beneficial, the cuetarget interval was too brief for an eye movement to be executed. A significant advantage of endogenous covert distribution of attention across the extended attentional visual field is that detection of a target that appears some distance from fixation is faster. Endogenously orienting attention to a new location can be faster than making an eye movement (Johnson and Proctor, 2004), and usually contribute to a subsequent eye movement to the attended location (Peterson et al., 2004).

We found that with a larger size of the cued area, overall accuracy of target detection was lower, implying a reduction of the average attentional intensity within the attended area. This inverse relationship between the size of the attended area and the attentional intensity (Feng and Spence, 2013, p.154) was first documented by Wolff (1738; 1740, translated and interpreted in Hatfield, 1998) in his Psychologia Empirica (1738) and Psychologia Rationalis (1740). Later, similar ideas were implied in Titchener's (1908) Law of Two Levels and also the zoomlens model of selective attention (Eriksen and Yeh, 1985; Eriksen and St. James, 1986). In the zoom-lens model, it is assumed that attention is distributed evenly across the selected area except there is a gradual decrease of the attentional intensity near the boundary (Eriksen and St. James, 1986); whereas performance on the AVF task suggests that the default (uncued) distribution of attention is more like a unimodal probability distribution, with lower attentional intensity at locations further from fixation. Experiment 1 also suggested that it was more difficult to modify

than to maintain the size of the to-be-attended area. This increase in difficulty was particularly profound when the size of the to-beattended area had to be reduced. Future explorations on potential underlying neural mechanisms are necessary.

Our results suggested that modifying the size of the attended area takes time to complete. This is evident in both Experiments 2A and 2B. In Experiment 2A, the advantage on valid trials, compared to invalid trials, was greater with a longer cue-target interval (comparing among intervals of 0, 50, and 100 ms). In Experiment 2B, the advantage on valid trials increased considerably when the interval was increased from 0 to 80 ms (from a non-significant difference in accuracy and RT, to a much higher accuracy and faster RT on valid trials). Improved performance with a valid endogenous cue and a longer cuetarget interval has been demonstrated at eccentricities of 10◦ and 20◦ (Shepherd and Müller, 1989). But in Shepherd and Müller (1989) study, the target was presented without distractors, thus only target detection (no discrimination or identification) was necessary. Experiments 2A and 2B suggested that this effect holds when discrimination, localization, and identification (finding the target in the presence of distractors) were also involved. And the effect holds not only at eccentricities of 10◦ and 20◦ , but also at an even more extreme eccentricity of 30◦ . Moreover, the benefit from a valid cue is progressively greater at locations further from fixation. Furthermore, in both Experiments 2A and 2B, with a longer cue-target interval, a valid cue did not further facilitate identification of the target, but an invalid cue further impaired the identification. This differs from the findings in Shepherd and Müller (1989). In Shepherd and Müller (1989), with a longer time following the onset of the cue (increased from 50 to 150 ms), the accuracy on target detection was further enhanced with a valid cue, and further impaired with an invalid cue. Notably, in Shepherd and Müller (1989), there was no distractor presented together with the target; in contrast, in Experiments 2A and 2B, the stimuli consisted of a target and eleven distractors. This may imply that, when only detection is involved, both enhancement in the expected direction and inhibition in the unexpected directions occur (between 50 and 150 ms following the onset of the cue). However, if discrimination and identification are also necessary, inhibition in the unexpected directions may have played a major role (between 80 and 180 ms following the onset of the cue).

# A Conceptual Model of Spatial Attention across the Visual Field

Based on our findings, we conceptualize the spatial distribution of attention as a bivariate probability distribution over the visual field (**Figure 5A**; Feng, 2011), similar to the idea describing the distribution of attention as a gradient of the attentional resource around the focus of attention ( Baldwin, 1889; Downing and Pinker, 1985; Eriksen and Yeh, 1985; LaBerge and Brown, 1989; James, 1890; Palmer, 1990; Müller et al., 2005). Our model specifically considers a large area of the visual field. The probability density at any particular location in the visual field represents the attentional intensity corresponding to that location (Feng and Spence, 2013, p.154). In general, the attentional intensity decreases with an increase in the distance from the fixation (Feng and Spence, 2014; Feng et al., 2016). Influences from expectations induced by pre-cueing were assumed to modify the distribution of attention. When a participant expects a target to appear anywhere within a large area, the spread of the distribution of attention is larger, to accommodate the greater uncertainty (**Figures 5B,C**), thus attentional processing in the periphery is increased. Consequently, since attention is assumed to be a fixed resource, the distribution flattens as it spreads. However, when a participant expects a target to appear in a particular direction, the distribution of attention is gradually shifted in that direction (**Figure 5D**), resulting in an increase in attentional processing of information at the expected direction. Our earlier work also suggests that there are preexisting biases in the attentional distribution (Feng and Spence, 2014; Feng et al., 2016). However, it is important to note that the simple model proposed in this paper is rudimentary and was intended to provide a qualitative description. Elaboration of the model could specify particular probability distributions and how these may be modified and further examination is necessary to further specify the model with more details. It is also critical to point out that this descriptive model only intends to describe possible spatial mechanisms of attention, which is only one aspect of the operation of attention. There are many non-spatial processes by attention (e.g., Egly et al., 1994; Moore et al., 1998; Lupyan, 2008; Lupyan and Spivey, 2010; Baldauf and Desimone, 2014). For example, when the image of a face and a house superimposed (thus the two objects are at the same spatial location), we can choose to pay attention to either one and ignore the other (Baldauf and Desimone, 2014). Another example is cross-modal facilitation given category-based attention (Lupyan and Spivey, 2010). Our visual processing of an item is enhanced when we hear a similar label. The proposed descriptive model is limited to the spatial processing of attention; it does not capture these non-spatial attentional processing, nor it explains potential interactions between space-based and object-based mechanisms.

# Implication of Eccentricity Effect on Cognitive Penetration of Perception

In the traditional views, attention has been conceptualized as a passive filter or gate-keeper (Broadbent, 1958; Posner, 1980) that selectively facilitates the processing of some information (e.g., targets) while inhibiting others (e.g., distractors). More recent views challenge this conceptualization and propose attention's role in active construction of perceptual representation under the influence from cognition (e.g., Lupyan, 2015; Nanay and Fazekas, 2017). This proposal on cognitive penetrability of perception suggests that the purpose of attention is to predict, to transform incoming sensory energy "into a useful form for a particular perceptual goal" for minimizing "global prediction error" (Lupyan, 2015, pp. 553, 564). On the debate of whether perception is cognitively penetrable via attention, a significant divide lies in the domain of evidence supporting each view. The passive filter/gate-keeper view has been built on research findings on the spatial processing of attention (e.g., the spot light metaphor of attention; Posner et al., 1980; Eriksen and Yeh, 1985; Awh and Pashler, 2000). In contrast, the more recent

cognitive-penetrable view of perception is supported by much evidence on the non-spatial processing of attention (e.g., Egly et al., 1994; Moore et al., 1998; Lupyan, 2008; Lupyan and Spivey, 2010; Baldauf and Desimone, 2014). It is possible that the spacebased attention and object-based attention are two streams of attentional processing that serve distinct purposes and are based on different mechanisms. Thus, the non-spatial processing of attention could be much more open than spatial attention to cognitively penetration of perception.

However, is spatial attention completely immune to this cognitive penetration? One approach to this question is to examine whether the attentional modulation in pre-cueing takes place before perceptual processing (e.g., Pylyshyn, 1999; Lupyan, 2015). Another way to answer the question is by making a distinction between early and late vision (e.g., Raftopoulos, 2009). Here we propose a third way to look at the question, that is to explore potential interaction between space-based and nonspace based attentional mechanisms. For example, it is possible that the size of the area that attentional selection is operating on and the object representation in attention are related. When we are looking for an object in a larger area, given extended space for simultaneous visual processing, we could be working with a more simplified representation of the object with reduced dimensions of features to allow efficient process. This would especially make sense when we consider attentional processing across the visual field as the visual periphery would only allow coarse processing. For instance, when one looks for a friend on the street without much idea of where the friend might be, the attentional mechanism may be just based on clothing color and general body shape. If the individual knows which street corner the friend is at, he/she could use a more detailed representation for target detection. A smaller area may lead to more concentrated spatial attention, however, with a more detailed object representation, target detection may not be faster. This could be a possible reason why we observed no change on target detection at an eccentricity of 10◦ across pre-cue conditions in our Experiment 1 (particularly Experiment 1A when significant changes were observed at eccentricities of 20◦ and 30◦ ). Nevertheless, it is speculative at this stage and our current experiments were not specifically designed to evaluate this hypothesis. Future experimental work, exploring attentional processing across an extended visual area, is needed to carefully examine this speculation.

# Eccentricity as an Important Factor in Understanding Attention

In our experiments, the size of the stimuli was kept constant across eccentricities. Therefore, we cannot distinguish between impacts from cortical and subcortical mechanisms on our eccentricity effect. One way to separate these impacts is to increase stimulus size as its location becomes more peripheral (i.e., M-scale the stimuli; Rovamo and Virsu, 1979). Several studies have attempted to contrast the results with and without M-scaling (Carrasco et al., 2003; Bao et al., 2013; Staugaard et al., 2016), and found that eccentricity effects were not completely eliminated by M-scaling, suggesting that the eccentricity effects were a combination of cortical magnification and other attentional mechanisms. It is possible that the eccentricity effects found in our experiments were also a combination of various neurophysiological and attentional mechanisms. Further examination is needed to isolate the impact from each individual mechanism.

In many tasks, attention must operate over a very large visual area to achieve superior performance in many daily tasks. For example, older adults are generally less able to identify important events in a cluttered visual environment across the visual field and this decline in selective attention can lead to poorer driving performance (Ball et al., 1990, 1993; Bedard et al., 2006) and higher risks of falls (Lajoie et al., 1996; Broman et al., 2004; Owsley and McGwin, 2004). During spatial navigation, blocking a participant's peripheral vision leads to severe impairment in wayfinding (Fortenbaugh et al., 2006). However, in the empirical efforts to understand pre-cueing effects in the laboratory, visual attentional processing, especially endogenous attention (i.e., the cognitive driven pre-cueing), had rarely be examined over an eccentricity of 20◦ . Given the intense coupling between attention and saccades (Sheliga et al., 1994; Hoffman and Subramaniam, 1995; Peterson et al., 2004), and a significant change in the saccadic characteristics at 20◦ of eccentricity (e.g., plateau of amplitude; Bahill et al., 1975), exploring visual attention across a large visual area that expands beyond 40◦ of visual angle is important. Our study found that effect of pre-cueing (using endogenous spatial cues) was in general greater at a larger eccentricity. This highlight the capability of our attentional system so we can intensely utilize our visual periphery for many daily tasks despite its sensory limitations. An important aspect that differs our second experiment with many earlier attentional studies on of endogenous direction cues (e.g., Bashinski and Bacharach, 1980; Posner, 1980) is the examination of eccentricity effect across an extended visual area (60◦ of visual angle). Our results showed that attentional shift is possible to a very distant area (at least up to 30◦ of amplitude) within a relatively brief period of time (even an 80 ms interval between the onsets of a cue and stimulus significantly speeded response). Although a shift of gaze larger than 20◦ is often accompanied by a head movement, our

## REFERENCES


capability to quickly orient attention to relatively peripheral areas may provide the benefit to improve our overall evaluation of the visual stimuli and more accurately determine where the gaze will move to.

## ETHICS STATEMENT

This study was carried out in accordance with the recommendations of Guide for Informed Consent and Guidelines for Ethical Conduct in Participant Observation, Research Ethics Boards of the University of Toronto with written informed consent from all subjects. All subjects gave written informed consent in accordance with the Declaration of Helsinki. The protocol was approved by the Social Sciences, Humanities and Education Research Ethics Board of the University of Toronto.

# AUTHOR CONTRIBUTIONS

JF and IS contributed to the concept of the study. JF conducted the experiment and analyzed the data, under the supervision of IS. JF and IS contributed to the interpretation of the results and preparing the manuscript.

### ACKNOWLEDGMENTS

The experiments and findings reported in this paper were parts of JF's doctoral dissertation that was completed at the University of Toronto under the guidance of IS. We would like to thank Dr. Gary Lupyan and the two reviewers for their insightful feedback that helped to improve this manuscript.




**Conflict of Interest Statement:** The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

Copyright © 2017 Feng and Spence. This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) or licensor are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.