Event Abstract

On the connections between SIFT and biological vision

  • 1 UCSD, Statistical Visual Computing Laboratory, United States

In the past decade, research in object recognition has firmly established the efficacy of image representations based on histograms of dominant gradient orientation. The SIFT descriptor, in particular, could be considered today’s default (low-level) representation for object recognition, adopted by hundreds of computer vision papers. It is heavily inspired by known computations of the early visual cortex, but has no formal detailed connection to computational neuroscience. Simultaneously, a seminal development in computational neuroscience research has been to explain the ability of individual cells to adapt their dynamic range to the strength of the visual stimulus, by the implementation of gain control through divisive normalization. In this work, we propose a novel representation of local image orientation that will show that these two, apparently disjoint, developments are, in fact, tightly coupled. We start by formulating the central motivating question for descriptors such as SIFT or HOG - "how to represent locally dominant image orientation", as a decision-theoretic problem. An orientation is defined as dominant, at a location of the visual field, if its Gabor response at that location is both 1)distinct from that of other orientations and 2)large. An optimal statistical test is then derived to determine if an orientation response is distinct. The core of this test is the posterior probability of each orientation at a location, given its Gabor response. The dominance of an orientation within a neighborhood R is then defined as the expected strength of responses in R, which are distinct. Exploiting known properties of natural image statistics, we then show that this measure of orientation dominance, denoted as bioSIFT, can be computed with the sequence of operations of the standard neuro-physiological model: simple cells composed of a linear filter, divisive normalization, and a saturating non-linearity, and complex cells that implement spatial pooling. This connection between computer vision and neuroscience provides additional justification to both the success of SIFT in computer vision, and the importance of divisive normalization in the brain. It also points to the importance of contrast normalization in vision. To illustrate this, we show that the simple replacement of non-normalized Gabor filter responses, with the normalized orientation descriptors of bioSIFT, produces very significant gains in the recognition accuracy of the HMAX network - a biologically-inspired object recognition architecture. The enhanced network outperforms the previous best HMAX results in the literature, and has performance competitive with that of comparable state-of-the-art non-biological recognition architectures. The proposed descriptor is also shown to exhibit the trademark properties of V1 neurons such as independence, sparseness, cross-orientation suppression and a contrast response that fits the Naka-Rushton equation. The independence properties are not exploited by current SIFT-based recognition architectures, which rely on a computationally expensive probabilistic representation (visual words) of feature dependence. We illustrate the potential of bioSIFT for computationally efficient classification by designing a gist classifier that exploits feature independence. This is shown to have good performance on a gist-based image classification task.

Conference: Computational and Systems Neuroscience 2010, Salt Lake City, UT, United States, 25 Feb - 2 Mar, 2010.

Presentation Type: Poster Presentation

Topic: Poster session I

Citation: Muralidharan K and Vasconcelos N (2010). On the connections between SIFT and biological vision. Front. Neurosci. Conference Abstract: Computational and Systems Neuroscience 2010. doi: 10.3389/conf.fnins.2010.03.00068

Copyright: The abstracts in this collection have not been subject to any Frontiers peer review or checks, and are not endorsed by Frontiers. They are made available through the Frontiers publishing platform as a service to conference organizers and presenters.

The copyright in the individual abstracts is owned by the author of each abstract or his/her employer unless otherwise stated.

Each abstract, as well as the collection of abstracts, are published under a Creative Commons CC-BY 4.0 (attribution) licence (https://creativecommons.org/licenses/by/4.0/) and may thus be reproduced, translated, adapted and be the subject of derivative works provided the authors and Frontiers are attributed.

For Frontiers’ terms and conditions please see https://www.frontiersin.org/legal/terms-and-conditions.

Received: 19 Feb 2010; Published Online: 19 Feb 2010.

* Correspondence: Kritika Muralidharan, UCSD, Statistical Visual Computing Laboratory, La Jolla, United States, krmurali@ucsd.edu