Re-Appreciating the Why of Cognition: 35 Years after Marr and Poggio

Marr and Poggio’s levels of description are one of the most well-known theoretical constructs of twentieth century cognitive science. It entails that behavior can and should be considered at three different levels: computation, algorithm, and implementation. In this contribution focus is on the computational level of description, the level that describes the “why” of cognition. I argue that the computational level should be taken as a starting point in devising experiments in cognitive (neuro)science. Instead, the starting point in empirical practice often is a focus on the stimulus or on some capacity of the cognitive system. The “why” of cognition tends to be ignored when designing research, and is not considered in subsequent inference from experimental results. The overall aim of this manuscript is to show how re-appreciation of the computational level of description as a starting point for experiments can lead to more informative experimentation.


INTRODUCTION
In 1976 Marr and Poggio published an internal MIT memo in which they coined the levels of description for cognitive and neural processes (Marr and Poggio, 1976). This scheme of description was elaborated by Marr (1982) in his book Vision and it is fair to say that it has had a tremendous influence on the multidisciplinary field of cognitive science. Here I call for a renewed appreciation and redefinition of one of these levels of description when designing and interpreting experiments in cognitive neuroscience. The levels are the computational level (the goal or the why of cognition), the level of representation and algorithm, and the level of physical implementation. As an example Marr (1982) considers a cash register at a store. The computational level specifies what the cash register does and why. What it does is addition and the reason it does addition is that "the rules we intuitively feel to be appropriate for combining the individual prices [of purchased items] in fact define the mathematical operation of addition" (Marr, 1982, p. 22, my addition). Second, the level of representation or algorithm, specifies how the cash register performs its computational function, such as using Arabic numbers and starting with the least significant (rightmost) numbers first and carrying them to the next level if the sum exceeds 9. Finally, there is the implementational level which specifies the device which implements the computational function of the cash register. This can be an electronic cash register, but it could also be an abacus (Marr, 1982).
In this contribution I call for a renewed appreciation of the computational level of description, or the why or goal of a given behavior. The computational level is inherently context-dependent and should be taken as the starting point for designing experiments in cognitive science, as is presently often not the case.

DEFINING THE COMPUTATIONAL LEVEL
In my understanding it is important to stress the contextdependence of description at the computational level. What a cognitive system has to do and why it is doing this will vary depending upon the motivation of the organism and on the context the behavior occurs in. Context-dependence is often not considered when specifying the computational goal of a given behavior and this is probably all right for cases such as a cash register, which performs the same computational function regardless of context. However, for human cognition viewing the computational function as context-dependent is key to our understanding and investigation of cognition.
Consider the computational goal of reading. This depends heavily on the intention of the reader. Do we read poetry for relaxation? Are we reading our car's manual because the lights are broken? Do we read a scientific manuscript because we want to extract the main message of the paper, or because we have to proofread it for spelling errors? Framing the computational goal as "to read" or "to extract information" is incomplete and uninformative.
Clearly, there is commonality in all instances of reading: the person perceives visual symbols, recognizes them as words and sentences and distils meaningful information from the text. This overlapping part is what gets stressed most often in cognitive science. However, to understand cognition we need to take the goal of the behavior into account and we want this computational level to be guiding our research. Once we do this, we see that reading under different circumstances can lead to qualitatively different behaviors, which are better characterized as arising from different computational goals than as the outcome of one computational goal (e.g., Zwaan, 1994;van den Broek et al., 2001;Kaakinen and Hyönä, 2010).
Therefore an appropriate description of the computational level for an organism is one that takes context-dependence into account: what is the computational goal for the organism at the present moment? This is a different description than the one advanced by Marr and Poggio, which is not sensitive to context. It is possible to define very high-level context-independent computational goals such as "survival" or "acting in the world," but this seems only informative in framing very general aspects of behavior (see Discussion). Instead, what we want is a specification that can guide actual empirical research. What is the computational goal for the organism at the present moment?

THE WHY OF COGNITION AS A STARTING POINT FOR EXPERIMENTATION
In large part of the literature implicitly there are two strategies for studying a given topic: stimulus-driven and capacity-driven. I will describe both research strategies in turn and will illustrate how taking the computational level of description can lead to different experimental manipulations which in turn lead to different conclusions concerning the research topic under study.

STIMULUS-DRIVEN RESEARCH
Stimulus-driven research takes a certain class of stimuli as starting point for experiments. The strategy is to "input" the stimulus into the system and measure a dependent variable (e.g., reaction times) that relates to processing of the stimulus. The computational goal for the system then is to process or represent the stimulus. As an example, consider the study of object perception.

Example: perception of graspable objects
Research shows that observation of graspable/manipulable objects leads to motor facilitation. For instance, when perceiving a pen or a coffee mug participants are faster to give a manual response as compared to when perceiving objects such as a traffic light or a book case. In neural terms, viewing pictures of manipulable objects leads to increased activation in parts of sensori-motor cortex involved in actual grasping (see Martin, 2007 for review). The rationale is that we act upon pens and coffee mugs in order to use them and that this action is part of our understanding of such objects. Hence when we perceive them we automatically activate the action program that we usually use to act upon the object. In a strong illustration, Tucker and Ellis (2004) showed that motor facilitation is observed even when an object is presented very briefly, or when it is visually masked, rendering the picture invisible.
The emphasis in this line of work is on the stimulus and the computational level is fixed: perception of a manipulable object will always lead to computation of motor programs appropriate for handling the object. If one specifies the computational level in this way, one will not choose to test context-dependence and hence the observed effect will appear to be context-invariant. Specifying the computational level at the level of the organism ("Why does the organism perceive the object?") will lead to a different experimental strategy in investigating object perception. Tipper et al. (2006) indeed found that activation of a motor program in response to observation of objects depends upon the computational goal with which the object is perceived. The shape of a door handle (round or square-shaped) influenced reaction times when participants had to judge the shape of the door handle, but not when they had to judge the color of the door handle. Action facilitation after object perception only occurs when the attended property of an object matters for action execution (see also Bub and Masson, 2006;Girardi et al., 2010;Taylor and Zwaan, 2010).
Setting aside the intricacies of these experiments, what is important to realize is that the task manipulation follows naturally from a specification of the computational goal for the participant: how an object is perceived will depend upon what the perceiver wants to do with the object. By taking a stimulus-driven approach one would not consider varying the context in which objects are perceived in a relevant manner, and miss out on an important aspect of object perception.
In summary, what I argue against is an exclusive focus on the stimulus in designing experiments. One may object that stimulusdriven research hardly exists, since most experiments in cognitive science do employ a behavioral task. That is, the participant is often doing something, so there is a "Why of cognition." The task is often merely used as a way to get a dependent variable (e.g., RT), or to make sure participants do not fall asleep (in neuroimaging). The task is detached from a reasonably realistic goal in the real world. For instance, in my own research, I asked participants to press a button when they saw a repetition of a picture of a face within a task block (Willems et al., 2010). This task was only added to make sure that participants would look at the screen and remain attentive. The results of the study are subsequently extrapolated to face perception per se, actively ignoring the task factors under which face perception was measured. The task limits the conclusion we can draw from this research: it describes what happens when a face picture falls on the retina under these restricted circumstances. On the contrary, it seems better to treat face perception not as one of a kind, but to consider how the intention of the perceiver influences face perception (e.g., Cohen Kadosh et al., 2010). Ideally, focus should not be only on the stimulus, but also the implemented task should be ecologically valid. That is, it should be driven by consideration of what we actually do during real life face perception.

CAPACITY-DRIVEN RESEARCH
Capacity-driven research investigates a cognitive capacity and extrapolates from this to explain a phenomenon at large. The rationale is that by investigating a restricted capacity of the system one gains insight into its "regular" functioning. The problem with this type of research is that the computational level is not specified at the level of the present goal for the organism, but is inspired by a given capacity of the system.

Example: recursion and embedding in sentence comprehension
Humans are capable of creating and understanding syntactic structures exhibiting recursion. Consider this example: "The mailman and the mother of Jim love the woman who Kate burnt" (Santi and Grodzinsky, 2007, p. 10) We are able to find out who did what to whom because we are able to process these recursive structures and it has been claimed to be the unique feature distinguishing humans from other animals (Hauser et al., 2002). This inspired research looking into neural processes underlying the processing of recursive sentences (e.g., Friederici et al., 2006;Santi and Grodzinsky, 2007).
The capacity-driven perspective defines the main task of the language user to reconstruct the underlying syntactic structure of an utterance: it places emphasis on the fact that we have this capacity. In the example above, the task for the system is to engage in syntactic analysis in order to find out who did what to whom. In this perspective the computational goal is fixed: we always engage in reconstruction of the underlying syntactic structure in order to understand the sentence. Similarly, if I hear the sentence "You guys going anywhere tonight?" the cognitive system may compute the "correct" syntactic structure capturing this sentence, including a representation of the omitted verb phrase "are." Alternatively we can conceive of language as a means of effectively sharing and communicating intentions, in this case the intention being that the other person wants to know where we are going. In the latter formulation, a full reconstruction of the underlying syntactic representation of the sentence is not needed and does not (necessarily) take place.
Ferreira and colleagues show that indeed the language system regularly does not engage in extensive syntactic analysis. For instance, participants give the wrong answer to the question who the agent is in sentences like "The dog was bitten by the man," even though they are not under time pressure to respond and they clearly spot the anomaly when asked to (Ferreira, 2003). This suggests that extensive syntactic analysis is not obligatory and perhaps not as regularly performed as suggested by the line of research which places strong emphasis on recursion. Instead of relying on syntactic analysis the system probably uses heuristics that are applied when reading this type of sentences and does not compute the underlying syntactic structure (see Ferreira et al., 2002;Ferreira and Patson, 2007 for extensive discussion).
The take home message from this example is that a seemingly sensible study object, becomes less sensible once we consider the computational goal of language comprehension. What looks like a corner stone of language to some, may turn out to be an interesting anomaly, with very limited relevance to the understanding of language comprehension. Different researchers will argue for different characterizations of the computational goal of language comprehension 1 .
Whatever the correct answer is, such discussion does not arise until one formulates the computational level of description. After formulation of the computational goal of language understanding, experimental manipulations can follow naturally from this description. This is an importantly different route than take the fact that the system has a given capacity (understanding syntactically very complex sentences) as starting point for experimentation.
1 But note that there is evidence which suggests that that recursion is not as vital to language understanding as suggested. Corpus work shows that sentences with longrange (wh-)dependencies are a subset of language, that (a) are hardly ever used and (b) share a range of common characteristics not at the syntactic level (for instance they are almost always used with "think" Verhagen, 2010).

Is capacity-driven research at the level of algorithm?
In defense of capacity-driven research one could argue that such research is at the level of the algorithm. Research on syntactically very complex sentences shows how the brain deals with syntactic complexity in language. This is of course true. If one's research goal lies in characterizing what happens at the algorithmic level of description, it is legitimate to do. Conclusions based upon capacity-driven research are almost never confined to the algorithmic level, but presented as providing information about the cognitive phenomenon at large. This is a mistake and consideration of the computational level of description would lead to experimentation which is more informative for understanding cognition.

WHAT I AM NOT CLAIMING
I argued that the why of cognition should be the starting point for designing experiments. I gave examples of how present research tends to take a stimulus or a given capacity of the cognitive system as a starting point instead.
Before I conclude, a few things I am not claiming:

STIMULUS-OR CAPACITY-DRIVEN RESEARCH IS "UNINTERESTING"
That some research is stimulus-or capacity-driven does not imply that the research is not interesting. It may be interesting for those interested in response properties of single neurons or for understanding mechanisms of lateral inhibition, to study these phenomena. There is nothing inherently wrong with studying the response to isolated stimuli such as faces, lines, or to look at what happens when participants read syntactically difficult sentences.
My main objections are that (i) researchers should clearly specify why they study what they study and (ii) that the conclusions drawn from research should be confined to the experimental setting under which the data are acquired. So if you study single cell responses to single lines, you need to come up with a strong linking hypothesis in order to argue that the findings are relevant to understanding visual perception as it is (Teller, 1984).

THERE IS NO SPECIALIZATION IN THE COGNITIVE SYSTEM
"Context-dependence" does not mean that a given brain region can do any type of computation. There is clearly specialization in the brain, the extent of which is a matter of ongoing debate. Indeed, there may be neurons in visual cortex that perform the computation "edge detection." This tells us something about the response properties of these neurons, but has very limited consequences for our understanding of visual perception as it is for the organism in the world 2 .
Moreover, properties of low-level visual regions can change dramatically depending on the behavioral state (Lamme et al., 1998; or time window at which the neuron is assessed (Lamme and Roelfsema, 2000;.

CONCEPTUALIZATION SHOULD BE COMPLETE BEFORE EMPIRICAL RESEARCH CAN START
It is easy to take remarks on defining the computational goal to an extreme by demanding that every cognitive phenomenon under study must first be fully and precisely defined through conceptual analysis (Bennett and Hacker, 2003). This is not my intention. Empirical research does influence conceptualization just as conceptual analysis does steer empirical research. They do and should go hand in hand.

COGNITIVE (NEURO)SCIENCE IS A MESS
I painted a relatively bleak picture of experimentation in cognitive (neuro)science. Exaggeration is an effective rhetorical strategy, and not all experimentation in the field ignores the computational goal in designing experiments, as the informed reader will surely be aware of. One example is recent work in action observation, which explicitly manipulates the goal for the observer (e.g., Spunt et al., 2010; see also Brass et al., 2007;de Lange et al., 2008;Ortigue et al., 2010).

THIS IDEA IS NOVEL
For the purpose of readability I have not referred to other scholars who have coined related ideas. The present proposal bears often obvious links to (among others) Dennett's intentional stance position (Dennett, 1987), situated or embodied cognition (Clark, 1997;Gallagher, 2005;Barsalou, 2008;Robbins and Aydede, 2008), situated robotics (Brooks, 1991;Pfeifer and Bongard, 2007), pragmatics (Sperber and Wilson, 1986;Clark, 1996;Levinson, 2000), visual science (Gibson, 1966(Gibson, , 1979O'Regan and Noe, 2001;Noe, 2004), and other reformulations of Marr's scheme (e.g., Shallice, 1988). I hope that framing of the idea and the relation to the examples we gave will help in getting the idea of functional relevance to work in the practices of cognitive neuroscience. I apologize to those whose work should have been mentioned in this list.

CONCLUDING REMARKS
"What under normal circumstances this organism is never confronted with is the pure stimulus, free from all interpretation" (Berthoz and Petit, 2008, p. 20).
The study of cognition is plagued with research traditions that effectively ignore the why of cognition. A lot of emphasis is put on the stimulus, or focus is on some capacity of the system. The existence of a stimulus or of a given capacity of the system should never be taken as the starting point for research. To paraphrase Berthoz and Petit: if the organism is never confronted with the pure stimulus, why pretend to study the response to a pure stimulus?
In my opinion cognition cannot be appropriately understood without taking context into account. This precludes a description at the computational level which is an exhaustive description for all occurrences of a cognitive process. So, general characterization can serve as a shorthand and guiding principle for research (e.g., "vision is for action," or "language is to establish truth conditions"), but an adequate description at the computational level should not treat cognition as something which occurs invariantly, or in the void.
A different and popular way of phrasing the computational level is at the level of a given subpart of the system, such as a brain region. For instance, the computational goal of the fusiform face area is to perform computations necessary for a perceiver to recognize or perceive a face. Here one ascribe functions to a subpart of a system, when in fact they are functions of the system as a whole (Bennett and Hacker, 2003;Wheeler, 2005 for discussion). If one believes that brain regions perform one circumscribed function, every single time this cognitive function is performed, one will focus on commonalities of neural responses in different experimental situations (cf. Mesulam, 1990Mesulam, , 1998. By doing this, it is easy to miss out on the flexibility of the system as a whole, because such differences are essentially brushed aside in favor of focus on the common aspect of neural responses. On the other hand, if one takes flexibility of localization seriously, it will become an important experimental question under which circumstances a given set of brain regions supports a cognitive process and under which circumstances the network is slightly (or not so slightly) different. Formulating the why of cognition will help in phrasing the relevant experimental manipulations to investigate the flexibility of involvement of brain areas in parts of cognition, and to see if our hypothesis on cognitive function is correct or not (see Willems and Casasanto, 2011 for an example from language research).
In conclusion, I hope that these remarks will motivate researchers to ask this question: what do I think the goal of a given behavior is, and how is this going to influence the way in which I do my experiment?