Abstract
Recent advances in Computer Vision and Experimental Neuroscience provided insights into mechanisms underlying invariant object recognition. However, due to the different research aims in both fields models tended to evolve independently. A tighter integration between computational and empirical work may contribute to cross-fertilized development of (neurobiologically plausible) computational models and computationally defined empirical theories, which can be incrementally merged into a comprehensive brain model. After reviewing theoretical and empirical work on invariant object perception, this article proposes a novel framework in which neural network activity and measured neuroimaging data are interfaced in a common representational space. This enables direct quantitative comparisons between predicted and observed activity patterns within and across multiple stages of object processing, which may help to clarify how high-order invariant representations are created from low-level features. Given the advent of columnar-level imaging with high-resolution fMRI, it is time to capitalize on this new window into the brain and test which predictions of the various object recognition models are supported by this novel empirical evidence.
Introduction
One of the most complex problems the visual system has to solve is recognizing objects across a wide range of encountered variations. Retinal information about one and the same object can dramatically vary when position, viewpoint, lighting, or distance change, or when the object is partly occluded by other objects. In Computer Vision, there are a variety of models using alignment, invariant properties, or part-decomposition methods (Roberts, 1965; Fukushima, 1982; Marr, 1982; Ullman et al., 2001; Viola and Jones, 2001; Lowe, 2004; Torralba et al., 2008), which are able to identify objects across a range of viewing conditions.
Some computational models are clearly biologically inspired and take for example the architecture of the visual system into account (e.g., Wersing and Körner, 2003), or cleverly adapt the concept of a powerful Computer Vision algorithm (e.g., the Fourier-Mellin transform) to a neurobiologically plausible alternative (Sountsov et al., 2011). Such models can successfully detect objects in sets of widely varying natural images (Torralba et al., 2008) and achieve impressive invariance (Sountsov et al., 2011). In general however, computer vision models are developed for practical image analysis applications (handwriting recognition, face detection, etc.) for which fast and accurate object recognition and not neurobiological validity is pivotal. Therefore, these models are generally less powerful in explaining how object constancy arises in the human brain. Indeed, “Models are common; good theories are scarce” as suggested by Stevens (2000, p. 1177). Humans are highly skilled in object recognition, and they outperform machines in object recognition tasks with great ease (Fleuret et al., 2011). This is partly because they are able to strategically use semantics and information from context or memory. In addition, they can direct attention to informative features in the image, while ignoring distracting information. Such higher cognitive processes are difficult to implement, but improve object recognition performance when taken into account (Lowe, 2000). Computer vision models might become more accurate in recognizing objects across a wide range of variations in image input, when implementing algorithms derived from neurobiological observations.
Reciprocally, our interpretation of such neurobiological findings might be greatly improved by insights in the underlying computational mechanisms. Humans can identify objects with great speed and accuracy, even when the object percept is degraded, occluded or presented in a highly cluttered visual scene (e.g., Thorpe et al., 1996). However, which computational mechanisms enable such remarkable performance is not yet fully understood. To create a comprehensive theory of human object recognition and how it achieves invariant object recognition, computational mechanisms derived from modeling efforts should be incorporated in neuroscientific theories based on experimental findings.
In the current paper, we highlight recent developments in object recognition research and put forward a “Common Brain Space” framework (CBS; Goebel and De Weerd, 2009; Peters et al., 2010) in which empirical data and computational results can be directly integrated and quantitatively compared.
Exploring invariant object recognition in the human visual system
Object recognition, discrimination, and identification are complex tasks. Different encounters with an object are unlikely to take place under identical viewing conditions, requiring the visual system to generalize across changes. Information that is important to retrieve object identity should be effectively processed, while unimportant view-point variations should be ignored. That is, the recognition system should be stable yet sensitive (Marr and Nishihara, 1978), leading to inherent tradeoffs. How the visual system is able to accomplish this task with such apparent ease is not yet understood. There are two classes of theories on object recognition. The first suggests that objects can be recognized by cardinal (“non-accidental”) properties that are relatively invariant to the objects' appearance (Marr, 1982; Biederman, 1987). Thus, these invariant properties and their spatial relations should provide sufficient information to recognize objects regardless of their viewpoint. However, how such cardinal properties are defined and recognized in an invariant manner is a complex issue (Tarr and Bülthoff, 1995). The second type of theory suggests that there are no such invariants but that objects are stored in the view as originally encountered (which, in natural settings encompasses multiple views being sampled in a short time interval), thereby maintaining view-dependent shape and surface information (Edelman and Bülthoff, 1992). Recognition of an object under different viewing conditions is achieved by either computing quality matches between the input and stored presentations (Perrett et al., 1998; Riesenhuber and Poggio, 1999) or by transforming input to match the view specifications of the stored representation (Bülthoff and Edelman, 1992). The latter normalization can be accomplished by interpolation (Poggio and Edelman, 1990), mental transformation (Tarr and Pinker, 1989), or alignment (Ullman, 1989).
These theories make very different neural predictions. View-invariant theories suggest that the visual system recognizes objects using a limited library of non-accidental properties, and neural representations are invariant. Evidence for such invariant object representations have been found at final stages of the visual pathway (Quiroga et al., 2005; Freiwald and Tsao, 2010). In contrast, the second class of theories assumes that neural object representations are view-dependent, with neurons being sensitive to object transformations. Clearly, the early visual system is sensitive to object appearance: the same object can elicit completely different, non-overlapping neural activation patterns when presented at different locations in the visual field. So, object representations are input specific at initial stages of processing, whereas invariant representations emerge in final stages. However, how objects are represented by intermediate stages of this processing chain is not yet well understood. Likely, multiple different transforms are (perhaps in parallel) performed at theses stages. This creates multiple object representations, in line with the various types of information (such as position and orientation) that have to be preserved for interaction with objects. Moreover, position information aids invariant object learning (Einhäuser et al., 2005; Li and DiCarlo, 2008, 2010) and representations can reflect view-dependent and view-invariant information simultaneously (Franzius et al., 2011).
The following section reviews evidence from monkey neurophysiology and human neuroimaging on how object perception and recognition are implemented in the primate brain. As already alluded to above, the visual system is hierarchically organized in more than 25 areas (Felleman and Van Essen, 1991) with initial processing of low-level visual information by neurons in the thalamus, striate cortex (V1) and V2; and of more complex features in V3 and V4 (Carlson et al., 2011). Further processing of object information in the human ventral pathway (Ungerleider and Haxby, 1994), involves higher-order visual areas such as the lateral occipital cortex (LOC; Malach, 1995) and object selective areas for faces (“FFA”; Kanwisher et al., 1997), bodies (“EBA”; Downing et al., 2001), words (“VWFA”; McCandliss et al., 2003), and scenes (“PPA”; Epstein et al., 1999).
The first studies on the neural mechanisms of object recognition were neurophysiological recordings in monkeys. In macaque anterior inferotemporal (IT) cortex, most of the object-selective neurons are tuned to viewing-position (Logothetis et al., 1995; Booth and Rolls, 1998), in line with viewpoint-dependent theories. On the other hand, IT neurons also turned out to be more sensitive to changes in “non-accidental” than to equally large pixel-wise changes in other shape features (“metric properties”; Kayaert et al., 2003), providing support for structural description theories (Biederman, 1987). Taken together, these studies provide neural evidence for both theories (see also Rust and Dicarlo, 2010). However, to which degree object representations are stored in an invariant or view-dependent manner across visual areas, and how these representations arise and are matched to incoming information, remains elusive.
Also human neuroimaging studies have not provided conclusive evidence. In fMRI studies, the BOLD signal reflects neural activity at the population rather than single-cell level. The highest functional resolution provided by standard 3 Tesla MRI scanners is around 2 × 2 × 2 mm3, which is too coarse to zoom into the functional architecture within visual areas. However, more subtle information-patterns can be extracted using multi-voxel pattern analysis (MVPA; Haynes et al., 2007) or fMRI-adaptation (fMRI-A; Grill-Spector and Malach, 2001). MVPA can reveal subtle differences in distributed fMRI patterns across voxels resulting from small biases in the distributions of differentially tuned neurons that are sampled by each voxel. By using classification techniques developed in machine learning, distributed spatial patterns of different classes (e.g., different objects) can be successfully discriminated (see Fuentemilla et al., 2010 for a temporal pattern classification example with MEG). For example, changing the position of an object significantly changes patterns in LOC, even more than replacing an object (at the same position) by an object of a different category (Sayres and Grill-Spector, 2008). Rotating the object (up to 60°) did not change LOC responses however (Eger et al., 2008) suggesting that LOC representations might be view-dependent in only some aspects. fMRI-A exploits the fact that the neuronal (and the corresponding hemodynamic) response is weaker for repeated compared to novel stimuli (Miller and Desimone, 1994). Thus, areas are sensitive to view-dependent changes when their BOLD response returns to its initial level for objects that are presented a second time, but now from a different view-point. This technique revealed interesting and unexpected findings. For example, a recent study observed view-point and size dependent coding at intermediate processing stages (V4, V3A, MT, and V7), whereas responses in higher visual areas were view-invariant (Konen and Kastner, 2008). Remarkably, these view-invariant representations were not only found in the ventral (e.g., LOC), but also in the dorsal pathway (e.g., IPS). The dorsal “where/how” or “perception-for-action” pathway is involved in visually guided actions toward objects rather than in identifying objects—which is mainly performed by the ventral or “what” pathway (Goodale and Milner, 1992; Ungerleider and Haxby, 1994). For this role, maintaining view-point dependent information in higher dorsal areas seems important, which however was thus not confirmed by the view-invariant results in IPS (but see James et al., 2002). Likewise, another recent study (Dilks et al., 2011) revealed an unexpected tolerance for mirror-reversals in visual scenes in a parahippocampal area thought to play a key role in navigation (e.g., Janzen and van Turennout, 2004) and reorientation (e.g., Epstein and Kanwisher, 1998), functions for which view-dependent information is essential. Furthermore, mixed findings have been reported for the object-selective LOC. For example, different findings on size, position, and viewpoint-invariant representations in different subparts of the LOC have been found (Grill-Spector et al., 1999; James et al., 2002; Vuilleumier et al., 2002; Valyear et al., 2006; Dilks et al., 2011). These divergent findings might be partly related to intricacies inherent to the fMRI-A approach (e.g., Krekelberg et al., 2006), and its sensitivity to the design used (Grill-Spector et al., 2006) and varying attention (Vuilleumier et al., 2005) and task demands (e.g., Ewbank et al., 2011). The latter should not be regarded as obscuring confounds however, since they appear to strongly contribute to our skilled performance. Object perception is accompanied by cognitive processes supporting fast (e.g., extracting the “gist” of a scene, attentional selection of relevant objects) and accurate (e.g., object-verification, semantic interpretation) object identification for subsequent goal-directed use of the object (e.g., grasping; tool-use). These processes engage widespread memory- and frontoparietal attention-related areas interacting with object processing in the visual system (Corbetta and Shulman, 2002; Bar, 2004; Ganis et al., 2007). As the involvement of such top-down processes might be particularly pronounced in humans—and weaker or even absent in monkeys and machines respectively—efforts to integrate computational modeling with human neuroimaging remain essential (see Tagamets and Horwitz, 1998; Corchs and Deco, 2002 for earlier work).
With the advent of ultra-high field fMRI (≥7 Tesla scanners), both the sensitivity (due to increases in signal-to-noise ratio linearly dependent on field strength) and the specificity (due to a stronger contribution of gray-matter microvasculature compared to large draining veins and less partial volume effects) of the acquired signal improves significantly, providing data at a level of detail which previously was only available via invasive optical imaging in non-human species. The functional visual system can be spatially sampled in the range of hundreds of microns, which is sufficient to resolve activation at the cortical column (Yacoub et al., 2008; Zimmermann et al., 2011) and layer (Polimeni et al., 2010) level. Given that cortical columns are thought to provide the organizational structure forming computational units involved in visual feature processing (Hubel and Wiesel, 1962; Tanaka, 1996; Mountcastle, 1997), the achievable resolution at ultra-high fields will therefore not only produce more detailed maps, but really has the potential to yield new vistas on within-area operations.
Integration of computational and experimental findings in CBS
The approach we propose is to project the predicted activity in a modeled area onto corresponding cortical regions where empirical data are collected (Figure 1). By interfacing empirical and simulated data in one anatomical “brain space”, direct and quantitative mutual hypothesis testing based on predicted and observed spatiotemporal activation patterns can be achieved. More specifically, modeled units (e.g., cortical columns) are 1-to-1 mapped to corresponding neuroimaging units (e.g., voxels, vertices) in the empirically acquired brain model (e.g., cortical gray matter surface). As a result, a running network simulation creates spatiotemporal data directly on a linked brain model, enabling highly specific and accurate comparisons between neuroimaging and neurocomputational data in the temporal as well as spatial domain. Note that in CBS (as implemented in Neurolator 3D; Goebel, 1993), computational and neuroimaging units can flexibly represent various neural signals (e.g., fMRI, EEG, MEG, fNIRS, or intracranial recordings). Furthermore, both hidden and output layers of the neural network can be projected to the brain model, providing additional flexibility to the framework as predicted and observed activations can be compared at multiple selected processing stages simultaneously (see Figure 2 for an example).
Figure 1
Figure 2
To model the human object recognition system, we developed large-scale networks of cortical column units, which dynamics can either reflect the spike activity, integrated synaptic activity, or oscillating activity (when modeled as burst oscillators), resulting from excitatory and inhibitory synaptic input. To create simulated spatiotemporal patterns, each unit of a network layer (output and/or hidden) is linked to a topographically corresponding patch on a cortical representation via a so-called Network-to-Brain Link (NBL). Via this link, activity of modeling units in the running network is transformed into timecourses of neuroimaging units, spatially organized in an anatomical coordinate system. Importantly, when simulated and measured data co-exist in the same representational space, the same analysis tools (e.g., MVPA, effective connectivity analysis) can be applied to both data sets allowing for quantitative comparisons (Figure 2). See Peters et al. (2010) for further details.
We propose that such a tight integration of neuroimaging and modeling data allows reciprocal fine-tuning and facilitates hypothesis testing at a mechanistic level as it leads to falsifiable predictions that can subsequently be empirically tested. Importantly, there is a direct topographical correspondence between computational (cortical columnar) units at the model and brain level. Moreover, comparisons between simulated and empirical data are not limited to activity patterns in output stages (i.e., object-selective areas in anterior IT such as FFA or even more anterior in putative “face exemplar” regions; Kriegeskorte et al., 2007), but also at intermediate stages (such as V4 and LOC). Interpreting the role of feature representations at intermediate stages may be essential for a comprehensive brain model of object recognition (Ullman et al., 2002).
Studying several stages of the visual hierarchy simultaneously, by quantitatively comparing ongoing visual processes across stages both within and between the simulated and empirically acquired dataset, may help to clarify how higher-order invariant representations are created from lower-level features in several ways. Firstly, this may reveal how object-coding changes along the visual pathway. Incoming percepts might be differently transformed and matched to stored object representations at several stages, with view-dependent matching at intermediate stages and matching of only informative properties (Biederman, 1987; Ullman et al., 2001) at later stages. Secondly, monitoring activity patterns at multiple processing stages simultaneously is desirable, given that early stages are influenced by processing in later stages. To facilitate object recognition, invariant information is for example fed back from higher to early visual areas (Williams et al., 2008), suggesting that object perception results from a dynamic interplay between visual areas. Finally, it is important to realize that such top-down influences are not limited to areas within the classical visual hierarchy, but also engage brain-wide networks involved in “initial guessing” (Bar et al., 2006), object selection (Serences et al., 2004), context integration (Graboi and Lisman, 2003; Bar, 2004), and object verification (Ganis et al., 2007). Such functions should be incorporated in computational brain models to fully comprehend what makes human object recognition so flexible, fast, and accurate. Modeling higher cognitive functions is in general challenging, but may be aided by considering empirical observations in object perception studies where the level of top-down processing varies (e.g., Ganis et al., 2007). The interactions between the visual pathway and frontoparietal system revealed by such fMRI studies can be compared at multiple processing stages to simulations, allowing a more subtle, process-specific fine-tuning of the modeled areas.
A number of recent fMRI studies applied en- and decoding techniques developed in the field of Machine Learning and Computer Vision, to interpret their data (Kriegeskorte et al., 2008; Miyawaki et al., 2008; Haxby et al., 2011; Naselaris et al., 2011; see LaConte, 2011 for an extention to Brain-Computer-Interfaces), showing that both fields are starting to approach each other. For example, by summarizing the complex statistical properties of natural images using a computer vision technique, a visual scene percept could be successfully reconstructed from fMRI activity (Naselaris et al., 2009). The trend to investigate natural vision is noteworthy, given that processing cluttered and dynamic natural visual input rather than artificially created isolated objects poses additional challenges to the visual system (Einhäuser and König, 2010). We believe that now columnar-level imaging is in reach with the advent of high-resolution fMRI (in combination with the recently developed en- and decoding fMRI methods) the time has come to more directly integrate computational and experimental neuroscience, and test which predictions of the various object recognition models are supported by this new type of empirical evidence.
Conflict of interest statement
The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.
Statements
Acknowledgments
This work received funding from the European Research Council under the European Union's Seventh Framework Programme (FP7/2007–2013)/ERC grant agreement n° 269853.
Conflict of interest
The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.
References
1
BarM. (2004). Visual objects in context. Nat. Rev. Neurosci. 5, 617–629. 10.1038/nrn1476
2
BarM.KassamK. S.GhumanA. S.BoshyanJ.SchmidA. M.SchmidtA. M.DaleA. M.HämäläinenM. S.MarinkovicK.SchacterD. L.RosenB. R.HalgrenE. (2006). Top-down facilitation of visual recognition. Proc. Natl. Acad. Sci. U.S.A. 103, 449–454. 10.1073/pnas.0507062103
3
BiedermanI. (1987). Recognition-by-components: a theory of human image understanding. Psychol. Rev. 94, 115–147.
4
BoothM. C.RollsE. T. (1998). View-invariant representations of familiar objects by neurons in the inferior temporal visual cortex. Cereb. Cortex8, 510–523. 10.1093/cercor/8.6.510
5
BülthoffH.EdelmanS. (1992). Psychophysical support for a two-dimensional view interpolation theory of object recognition. Proc. Natl. Acad. Sci. U.S.A. 89, 60–64.
6
CarlsonE. T.RasquinhaR. J.ZhangK.ConnorC. E. (2011). A sparse object coding scheme in area V4. Curr. Biol. 21, 288–293. 10.1016/j.cub.2011.01.013
7
CorbettaM.ShulmanG. L. (2002). Control of goal-directed and stimulus-driven attention in the brain. Nat. Rev. Neurosci. 3, 201–215. 10.1038/nrn755
8
CorchsS.DecoG. (2002). Large-scale neural model for visual attention: integration of experimental single-cell and fMRI data. Cereb. Cortex12, 339–348. 10.1093/cercor/12.4.339
9
DilksD. D.JulianJ. B.KubiliusJ.SpelkeE. S.KanwisherN. (2011). Mirror-image sensitivity and invariance in object and scene processing pathways. J. Neurosci. 31, 11305–11312. 10.1523/JNEUROSCI.1935-11.2011
10
DowningP. E.JiangY.ShumanM.KanwisherN. (2001). A cortical area selective for visual processing of the human body. Science293, 2470–2473. 10.1126/science.1063414
11
EdelmanS.BülthoffH. (1992). Orientation dependence in the recognition of familiar and novel views of three-dimensional objects. Vision Res. 32, 2385–2400. 10.1016/0042-6989(92)90102-O
12
EgerE.AshburnerJ.HaynesJ.-D.DolanR. J.ReesG. (2008). fMRI activity patterns in human LOC carry information about object exemplars within category. J. Cogn. Neurosci. 20, 356–370. 10.1162/jocn.2008.20019
13
EinhäuserW.HippJ.EggertJ.KörnerE.KönigP. (2005). Learning viewpoint invariant object representations using a temporal coherence principle. Biol. Cybern. 93, 79–90. 10.1007/s00422-005-0585-8
14
EinhäuserW.KönigP. (2010). Getting real-sensory processing of natural stimuli. Curr. Opin. Neurobiol. 20, 389–395. 10.1016/j.conb.2010.03.010
15
EpsteinR.HarrisA.StanleyD.KanwisherN. (1999). The parahippocampal place area: recognition, navigation, or encoding?Neuron23, 115–125. 10.1016/S0896-6273(00)80758-8
16
EpsteinR.KanwisherN. (1998). A cortical representation of the local visual environment. Nature392, 598–601. 10.1038/33402
17
EwbankM. P.LawsonR. P.HensonR. N.RoweJ. B.PassamontiL.CalderA. J. (2011). Changes in ‘top-down’ connectivity underlie repetition suppression in the ventral visual pathway. J. Neurosci. 31, 5635–5642. 10.1523/JNEUROSCI.5013-10.2011
18
FellemanD. J.Van EssenD. C. (1991). Distributed hierarchical processing in primate visual cortex. Cereb. Cortex1, 1–47. 10.1093/cercor/1.1.1-a
19
FleuretF.LiT.DuboutC.WamplerE. K.YantisS.GemanD. (2011). Comparing machines and humans on a visual categorization test. Proc. Natl. Acad. Sci. U.S.A. 108, 17621–17625. 10.1073/pnas.1109168108
20
FranziusM.WilbertN.WiskottL. (2011). Invariant object recognition and pose estimation with slow feature analysis. Neural. Comput. 23, 2289–2323. 10.1162/NECO_a_00171
21
FreiwaldW. ATsaoD. Y. (2010). Functional compartmentalization and viewpoint generalization within the macaque face-processing system. Science330, 845–851. 10.1126/science.1194908
22
FuentemillaL.PennyW. D.CashdollarN.BunzeckN.DüzelE. (2010). Theta-coupled periodic replay in working memory. Curr. Biol. 20, 606–612. 10.1016/j.cub.2010.01.057
23
FukushimaK. (1982). Neocognitron: a new algorithm for pattern recognition tolerant of deformations and shifts in position. Pattern Recogn. 15, 455–469.
24
GanisG.SchendanH. E.KosslynS. M. (2007). Neuroimaging evidence for object model verification theory: role of prefrontal control in visual object categorization. Neuroimage34, 384–398. 10.1016/j.neuroimage.2006.09.008
25
GoebelR. (1993). “Perceiving complex visual scenes: an oscillator neural network model that integrates selective attention, perceptual organisation, and invariant recognition,” in Advances in Neural Information Processing Systems, Vol. 5, eds GilesJ.HansonC.CowanS. (San Diego, CA: Morgan Kaufmann), 903–910.
26
GoebelR.De WeerdP. (2009). “Perceptual filling-in: from experimental data to neural network modeling,” in The Cognitive Neurosciences Vol. 6, ed GazzanigaM. (Cambridge, MA: MIT Press), 435–456.
27
GoodaleM. A.MilnerA. D. (1992). Separate visual pathways for perception and action. Trends Neurosci. 15, 20–25. 10.1016/0166-2236(92)90344-8
28
GraboiD.LismanJ. (2003). Recognition by top-down and bottom-up processing in cortex: the control of selective attention. J. Neurophysiol. 90, 798–810. 10.1152/jn.00777.2002
29
Grill-SpectorK.KushnirT.EdelmanS.AvidanG.ItzchakY.MalachR. (1999). Differential processing of objects under various viewing conditions in the human lateral occipital complex. Neuron24, 187–203. 10.1016/S0896-6273(00)80832-6
30
Grill-SpectorK.HensonR.MartinA. (2006). Repetition and the brain: neural models of stimulus-specific effects. Trends Cogn. Sci. 10, 14–23. 10.1016/j.tics.2005.11.006
31
Grill-SpectorK.MalachR. (2001). fMR-adaptation: a tool for studying the functional properties of human cortical neurons. Acta. Psychol. (Amst.)107, 293–321.
32
HaxbyJ. V.GuntupalliJ. S.ConnollyA. C.HalchenkoY. O.ConroyB. R.GobbiniM. I.HankeM.RamadgeP. J. (2011). A common, high-dimensional model of the representational space in human ventral temporal cortex. Neuron72, 404–416. 10.1016/j.neuron.2011.08.026
33
HaynesJ.-D.SakaiK.ReesG.GilbertS.FrithC.PassinghamR. E. (2007). Reading hidden intentions in the human brain. Curr. Biol. 17, 323–328. 10.1016/j.cub.2006.11.072
34
HubelD. H.WieselT. N. (1962). Receptive fields, binocular interaction and functional architecture in the cat's visual cortex. J. Physiol. 160, 106–154.
35
JamesT. W.HumphreyG. K.GatiJ. S.MenonR. S.GoodaleM. A. (2002). Differential effects of viewpoint on object-driven activation in dorsal and ventral streams. Neuron35, 793–801. 10.1016/S0896-6273(02)00803-6
36
JanzenG.van TurennoutM. (2004). Selective neural representation of objects relevant for navigation. Nat. Neurosci. 7, 673–677. 10.1038/nn1257
37
KanwisherN.McDermottJ.ChunM. M. (1997). The fusiform face area: a module in human extrastriate cortex specialized for face perception. J. Neurosci. 17, 4302–4311.
38
KayaertG.BiedermanI.VogelsR. (2003). Shape tuning in macaque inferior temporal cortex. J. Neurosci. 23, 3016–3027.
39
KrekelbergB.BoyntonG. M.van WezelR. J. (2006). Adaptation: from single cells to BOLD signals. Trends Neurosci. 29, 250–256. 10.1016/j.tins.2006.02.008
40
KonenC. S.KastnerS. (2008). Two hierarchically organized neural systems for object information in human visual cortex. Nat. Neurosci. 11, 224–231. 10.1038/nn2036
41
KriegeskorteN.FormisanoE.SorgerB.GoebelR. (2007). Individual faces elicit distinct response patterns in human anterior temporal cortex. Proc. Natl. Acad. Sci. U.S.A. 104, 20600–20605. 10.1073/pnas.0705654104
42
KriegeskorteN.MurM.RuffD. A.KianiR.BodurkaJ.EstekyH.TanakaK.BandettiniP. A. (2008). Matching categorical object representations in inferior temporal cortex of man and monkey. Neuron60, 1126–1141. 10.1016/j.neuron.2008.10.043
43
LaConteS. M. (2011). Decoding fMRI brain states in real-time. Neuroimage56, 440–454. 10.1016/j.neuroimage.2010.06.052
44
LiN.DiCarloJ. J. (2008). Unsupervised natural experience rapidly alters invariant object representation in visual cortex. Science321, 1502–1507. 10.1126/science.1160028
45
LiN.DiCarloJ. J. (2010). Unsupervised natural visual experience rapidly reshapes size-invariant object representation in inferior temporal cortex. Neuron67, 1062–1075. 10.1016/j.neuron.2010.08.029
46
LogothetisN. K.PaulsJ.PoggioT. (1995). Shape representation in the inferior temporal cortex of monkeys. Curr. Biol. 5, 552–563. 10.1016/S0960-9822(95)00108-4
47
LoweD. G. (2000). “Towards a computational model for object recognition in IT cortex,”First IEEE International Workshop on Biologically Motivated Computer Vision (Seoul, Korea), 1–11.
48
LoweD. G. (2004). Distinctive image features from scale-invariant keypoints. Int. J. Comp. Vis. 60, 91–110.
49
MalachR. (1995). Object-related activity revealed by functional magnetic resonance imaging in human occipital cortex. Proc. Natl. Acad. Sci. U.S.A. 92, 8135–8139.
50
MarrD. (1982). Vision: A Computational Investigation into the Human Representation and Processing of Visual Information. San Francisco, CA: Freeman.
51
MarrD.NishiharaK. (1978). Representation and recognition of the spatial organization of three-dimensional shapes. Proc. R. Soc. Lond. B Biol. Sci. 200, 269–294.
52
McCandlissB. D.CohenL.DehaeneS. (2003). The visual word form area: expertise for reading in the fusiform gyrus. Trends Cogn. Sci. 7, 293–299. 10.1016/S1364-6613(03)00134-7
53
MillerE. K.DesimoneR. (1994). Parallel neuronal mechanisms for short-term memory. Science263, 520–522. 10.1126/science.8290960
54
MiyawakiY.UchidaH.YamashitaO.SatoM. A.MoritoY.TanabeH. C.SadatoN.KamitaniY. (2008). Visual image reconstruction from human brain activity using a combination of multiscale local image decoders. Neuron60, 915–929. 10.1016/j.neuron.2008.11.004
55
MountcastleV. B. (1997). The columnar organization of the neocortex. Brain120, 701–722. 10.1093/brain/120.4.701
56
NaselarisT.KayK. N.NishimotoS.GallantJ. L. (2011). Encoding and decoding in fMRI. Neuroimage56, 400–410. 10.1016/j.neuroimage.2010.07.073
57
NaselarisT.PrengerR. J.KayK. N.OliverM.GallantJ. L. (2009). Bayesian reconstruction of natural images from human brain activity. Neuron63, 902–915. 10.1016/j.neuron.2009.09.006
58
PerrettD. I.OramM. W.AshbridgeE. (1998). Evidence accumulation in cell populations responsive to faces: an account of generalisation of recognition without mental transformations. Cognition67, 111–145.
59
PetersJ. C.JansB.van de VenV.De WeerdP.GoebelR. (2010). Dynamic brightness induction in V1: analyzing simulated and empirically acquired fMRI data in a “common brain space” framework. Neuroimage52, 973–984. 10.1016/j.neuroimage.2010.03.070
60
PoggioT.EdelmanS. (1990). A network that learns to recognize three-dimensional objects. Nature343, 263–266. 10.1038/343263a0
61
PolimeniJ. R.FischlB.GreveD. N.WaldL. L. (2010). Laminar analysis of 7T BOLD using an imposed activation pattern in human V1. Neuroimage52, 1334–1346. 10.1016/j.neuroimage.2010.05.005
62
QuirogaR. Q.ReddyL.KreimanG.KochC.FriedI. (2005). Invariant visual representation by single neurons in the human brain. Nature435, 1102–1107. 10.1038/nature03687
63
RiesenhuberM.PoggioT. (1999). Hierarchical models of object recognition in cortex. Nat. Neurosci. 2, 1019–1025. 10.1038/14819
64
RobertsL. G. (1965). “Machine perception of 3-D solids,” in Optical and Electro-Optical Information Processing, ed TippetJ. T. (Cambridge, MA: MIT Press), 159–197.
65
RustN. C.DicarloJ. J. (2010). Selectivity and tolerance (“invariance”) both increase as visual information propagates from cortical area V4 to IT. J. Neurosci. 30, 12978–12995. 10.1523/JNEUROSCI.0179-10.2010
66
SayresR.Grill-SpectorK. (2008). Relating retinotopic and object-selective responses in human lateral occipital cortex. J. Neurophysiol. 100, 249. 10.1152/jn.01383.2007
67
SerencesJ. T.SchwarzbachJ.CourtneyS. M.GolayX.YantisS. (2004). Control of object-based attention in human cortex. Cereb. Cortex14, 1346–1357. 10.1093/cercor/bhh095
68
SountsovP.SantucciD. M.LismanJ. E. (2011). A biologically plausible transform for visual recognition that is invariant to translation, scale, and rotation. Front. Comput. Neurosci. 5:53. 10.3389/fncom.2011.00053
69
StevensC. (2000). Models are common; good theories are scarce. Nature3, 92037.
70
TagametsM. AHorwitzB. (1998). Integrating electrophysiological and anatomical experimental data to create a large-scale model that simulates a delayed match-to-sample human brain imaging study. Cereb. Cortex8, 310–320. 10.1093/cercor/8.4.310
71
TanakaK. (1996). Inferotemporal cortex and object vision. Annu. Rev. Neurosci. 19, 109–139. 10.1146/annurev.ne.19.030196.000545
72
TarrM. J.BülthoffH. (1995). Is human object recognition better described by geon-structural-descriptions or by multiple-views?J. Exp. Psychol. Hum. Percept. Perform. 21, 1494–1505.
73
TarrM. J.PinkerS. (1989). Mental rotation and orientation-dependence in shape recognition. Cogn. Psychol. 21, 233–282. 10.1016/0010-0285(89)90009-1
74
ThorpeS.FizeD.MarlotC. (1996). Speed of processing in the human visual system. Nature381, 520–522. 10.1038/381520a0
75
TorralbaA.FergusR.FreemanW. T. (2008). 80 million tiny images: a large data set for nonparametric object and scene recognition. IEEE. Trans. Pattern. Anal. Mach. Intell. 30, 1958–1970. 10.1109/TPAMI.2008.128
76
UllmanS. (1989). Aligning pictorial descriptions: an approach to object recognition. Cognition32, 193–254.
77
UllmanS.SaliE.Vidal-NaquetM. (2001). “A fragment-based approach to object representation and classification,” in International Workshop on Visual Form, eds ArcelliA.CordellaL. P.Sanniti di BajaG. (Berlin: Springer), 85–100.
78
UllmanS.Vidal-NaquetM.SaliE. (2002). Visual features of intermediate complexity and their use in classification. Nat. Neurosci. 5, 682–687. 10.1038/nn870
79
UngerleiderL. G.HaxbyJ. V. (1994). “What” and “where” in the human brain. Curr. Opin. Neurobiol. 4, 157–165.
80
ValyearK. F.CulhamJ. C.SharifN.WestwoodD.GoodaleM. A. (2006). A double dissociation between sensitivity to changes in object identity and object orientation in the ventral and dorsal visual streams: a human fMRI study. Neuropsychologia44, 218–228. 10.1016/j.neuropsychologia.2005.05.004
81
ViolaP.JonesM. (2001). Rapid object detection using a boosted cascade of simple features. Comput. Vis. Pattern Recog. 1, I-511–I-518.
82
VuilleumierP.HensonR. N.DriverJ.DolanR. J. (2002). Multiple levels of visual object constancy revealed by event-related fMRI of repetition priming. Nat. Neurosci. 5, 491–499. 10.1038/nn839
83
VuilleumierP.SchwartzS.DolanR. J.DriverJ. (2005). Selective attention modulates neural substrates of repetition priming and “implicit” visual memory: suppressions and enhancements revealed by fMRI. J. Cogn. Neurosci. 17, 1245–1260. 10.1162/0898929055002409
84
WersingH.KörnerE. (2003). Learning optimized features for hierarchical models of invariant object recognition. Neural. Comput. 15, 1559–1588. 10.1162/089976603321891800
85
WilliamsM. ABakerC. I.Op de BeeckH. P.ShimW. M.DangS.TriantafyllouC.KanwisherN. (2008). Feedback of visual object information to foveal retinotopic cortex. Nat. Neurosci. 11, 1439–1445. 10.1038/nn.2218
86
YacoubE.HarelN.UgurbilK. (2008). High-field fMRI unveils orientation columns in humans. Proc. Natl. Acad. Sci. U.S.A. 105, 10607–10612. 10.1073/pnas.0804110105
87
ZimmermannJ.GoebelR.De MartinoF.van de MoorteleP.-F.FeinbergD.AdrianyG.ChaimowD.ShmuelA.UğurbilK.YacoubE. (2011). Mapping the organization of axis of motion selective features in human area MT using high-field fMRI. PLoS One6:e28716. 10.1371/journal.pone.0028716
Summary
Keywords
object perception, view-invariant object recognition, neuroimaging, large-scale neuromodeling, (high-field) fMRI, multimodal data integration
Citation
Peters JC, Reithler J and Goebel R (2012) Modeling invariant object processing based on tight integration of simulated and empirical data in a Common Brain Space. Front. Comput. Neurosci. 6:12. doi: 10.3389/fncom.2012.00012
Received
31 October 2011
Accepted
24 February 2012
Published
09 March 2012
Volume
6 - 2012
Edited by
Evgeniy Bart, Palo Alto Research Center, USA
Reviewed by
Peter Konig, University of Osnabrück, Germany; John Lisman, Brandeis University, USA
Copyright
© 2012 Peters, Reithler and Goebel.
This is an open-access article distributed under the terms of the Creative Commons Attribution Non Commercial License, which permits non-commercial use, distribution, and reproduction in other forums, provided the original authors and source are credited.
*Correspondence: Judith C. Peters, Department Neuroimaging and Neuromodeling, Netherlands Institute for Neuroscience, Meibergdreef 47, 1105 BA, Amsterdam, Netherlands. e-mail: j.peters@nin.knaw.nl
Disclaimer
All claims expressed in this article are solely those of the authors and do not necessarily represent those of their affiliated organizations, or those of the publisher, the editors and the reviewers. Any product that may be evaluated in this article or claim that may be made by its manufacturer is not guaranteed or endorsed by the publisher.